----- Original Message ----- > From: "Atin Mukherjee" <amukh...@redhat.com> > To: "Oleksandr Natalenko" <oleksa...@natalenko.name>, "Nithya Balachandran" > <nbala...@redhat.com>, "Raghavendra > Gowdappa" <rgowd...@redhat.com>, "Shyam Ranganathan" <srang...@redhat.com> > Cc: "Gluster Devel" <gluster-devel@gluster.org> > Sent: Tuesday, October 18, 2016 9:58:07 PM > Subject: Re: [Gluster-devel] Spurious failure of > ./tests/bugs/glusterd/bug-913555.t > > Final reminder before I take out the test case from the test file. > > On Thursday 13 October 2016, Atin Mukherjee <amukh...@redhat.com> wrote: > > > > > > > On Wednesday 12 October 2016, Atin Mukherjee <amukh...@redhat.com> wrote: > > > >> So the test fails (intermittently) in check_fs which tries to do a df on > >> the mount point for a volume which is carved out of three bricks from 3 > >> nodes and one node is completely down. A quick look at the mount log > >> reveals the following: > >> > >> [2016-10-10 13:58:59.279446]:++++++++++ > >> G_LOG:./tests/bugs/glusterd/bug-913555.t: > >> TEST: 48 0 check_fs /mnt/glusterfs/0 ++++++++++ > >> [2016-10-10 13:58:59.287973] W [MSGID: 114031] > >> [client-rpc-fops.c:2930:client3_3_lookup_cbk] 0-patchy-client-2: > >> remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) > >> [Transport endpoint is not connected] > >> [2016-10-10 13:58:59.288326] I [MSGID: 109063] > >> [dht-layout.c:713:dht_layout_normalize] 0-patchy-dht: Found anomalies in > >> / (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0 > >> [2016-10-10 13:58:59.288352] W [MSGID: 109005] > >> [dht-selfheal.c:2102:dht_selfheal_directory] 0-patchy-dht: Directory > >> selfheal failed: 1 subvolumes down.Not fixing. path = /, gfid = > >> [2016-10-10 13:58:59.288643] W [MSGID: 114031] > >> [client-rpc-fops.c:2930:client3_3_lookup_cbk] 0-patchy-client-2: > >> remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) > >> [Transport endpoint is not connected] > >> [2016-10-10 13:58:59.288927] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] > >> 0-fuse: 00000000-0000-0000-0000- 000000000001: failed to > >> resolve (Stale file handle) > >> [2016-10-10 13:58:59.288949] W [fuse-bridge.c:2597:fuse_opendir_resume] > >> 0-glusterfs-fuse: 7: OPENDIR (00000000-0000- 0000-0000-000000000001) > >> resolution failed > >> [2016-10-10 13:58:59.289505] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] > >> 0-fuse: 00000000-0000-0000-0000- 000000000001: failed to > >> resolve (Stale file handle) > >> [2016-10-10 13:58:59.289524] W [fuse-bridge.c:3137:fuse_statfs_resume] > >> 0-glusterfs-fuse: 8: STATFS (00000000-0000- 0000-0000-000000000001) > >> resolution fail > >> > >> DHT team - are these anomalies expected here? I also see opendir and > >> statfs failing here too.
Not sure whether anomalies are expected or not. But the thing is they've no bearing on statfs. Irrespective of self-heal results, lookup is successful (if it is successful on at least one subvol). So, I don't see a DHT issue here. However, the logs point out that resolution of root gfid failed and hence statfs couldn't be resumed. It would be worthwhile to look into where/why lookup on gfid 0x1 failed. > >> > > > > Any luck with this? I don't see any relevance of having a check_fs test > > w.r.t the bug this test case is tagged to. If I don't get to hear on this > > in few days, I'd go ahead and remove this check from the test to avoid the > > spurious failure. > > > > > >> > >> > >> On Wed, Oct 12, 2016 at 12:18 PM, Atin Mukherjee <amukh...@redhat.com> > >> wrote: > >> > >>> I will take a look at it in sometime. > >>> > >>> On Wed, Oct 12, 2016 at 12:08 PM, Oleksandr Natalenko < > >>> oleksa...@natalenko.name> wrote: > >>> > >>>> Hello. > >>>> > >>>> Vijay asked me to drop a note about spurious failure of > >>>> ./tests/bugs/glusterd/bug-913555.t test. Here are the examples: > >>>> > >>>> * https://build.gluster.org/job/centos6-regression/1069/consoleFull > >>>> * https://build.gluster.org/job/centos6-regression/1076/consoleFull > >>>> > >>>> Could someone take a look at it? > >>>> > >>>> Also, last two tests were broken because of this: > >>>> > >>>> === > >>>> Slave went offline during the build > >>>> === > >>>> > >>>> See these builds for details: > >>>> > >>>> * https://build.gluster.org/job/centos6-regression/1077/consoleFull > >>>> * https://build.gluster.org/job/centos6-regression/1078/consoleFull > >>>> > >>>> Was that intentionally? > >>>> > >>>> Thanks. > >>>> > >>>> Regards, > >>>> Oleksandr > >>>> _______________________________________________ > >>>> Gluster-devel mailing list > >>>> Gluster-devel@gluster.org > >>>> http://www.gluster.org/mailman/listinfo/gluster-devel > >>>> > >>> > >>> > >>> > >>> -- > >>> > >>> --Atin > >>> > >> > >> > >> > >> -- > >> > >> --Atin > >> > > > > > > -- > > --Atin > > > > > -- > --Atin > _______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel