>From the logs it seems files are present on data(21,22,23,24) which are on >nas6 while missing on data(17,18,19,20) which are on nas5 (interesting). There >is an existing issue where directories does not show up on mount point if they >are not present on first_up_subvol(longest living brick) and the current issue >looks more similar. Well will look at the client logs for more information.
Susant. ----- Original Message ----- From: "Franco Broi" <[email protected]> To: "Pranith Kumar Karampuri" <[email protected]> Cc: "Susant Palai" <[email protected]>, [email protected], "Raghavendra Gowdappa" <[email protected]>, [email protected], [email protected], [email protected] Sent: Wednesday, 4 June, 2014 10:32:37 AM Subject: Re: [Gluster-users] glusterfsd process spinning On Wed, 2014-06-04 at 10:19 +0530, Pranith Kumar Karampuri wrote: > On 06/04/2014 08:07 AM, Susant Palai wrote: > > Pranith can you send the client and bricks logs. > I have the logs. But I believe for this issue of directory not listing > entries, it would help more if we have the contents of that directory on > all the directories in the bricks + their hash values in the xattrs. Strange thing is, all the invisible files are on the one server (nas6), the other seems ok. I did rm -Rf of /data2/franco/dir* and was left with this one directory - there were many hundreds which were removed successfully. I've attached listings and xattr dumps. Cheers, Volume Name: data2 Type: Distribute Volume ID: d958423f-bd25-49f1-81f8-f12e4edc6823 Status: Started Number of Bricks: 8 Transport-type: tcp Bricks: Brick1: nas5-10g:/data17/gvol Brick2: nas5-10g:/data18/gvol Brick3: nas5-10g:/data19/gvol Brick4: nas5-10g:/data20/gvol Brick5: nas6-10g:/data21/gvol Brick6: nas6-10g:/data22/gvol Brick7: nas6-10g:/data23/gvol Brick8: nas6-10g:/data24/gvol Options Reconfigured: nfs.drc: on cluster.min-free-disk: 5% network.frame-timeout: 10800 nfs.export-volumes: on nfs.disable: on cluster.readdir-optimize: on Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick nas5-10g:/data17/gvol 49152 Y 6553 Brick nas5-10g:/data18/gvol 49153 Y 6564 Brick nas5-10g:/data19/gvol 49154 Y 6575 Brick nas5-10g:/data20/gvol 49155 Y 6586 Brick nas6-10g:/data21/gvol 49160 Y 20608 Brick nas6-10g:/data22/gvol 49161 Y 20613 Brick nas6-10g:/data23/gvol 49162 Y 20614 Brick nas6-10g:/data24/gvol 49163 Y 20621 Task Status of Volume data2 ------------------------------------------------------------------------------ There are no active volume tasks > > Pranith > > > > Thanks, > > Susant~ > > > > ----- Original Message ----- > > From: "Pranith Kumar Karampuri" <[email protected]> > > To: "Franco Broi" <[email protected]> > > Cc: [email protected], "Raghavendra Gowdappa" > > <[email protected]>, [email protected], [email protected], > > [email protected], [email protected] > > Sent: Wednesday, 4 June, 2014 7:53:41 AM > > Subject: Re: [Gluster-users] glusterfsd process spinning > > > > hi Franco, > > CC Devs who work on DHT to comment. > > > > Pranith > > > > On 06/04/2014 07:39 AM, Franco Broi wrote: > >> On Wed, 2014-06-04 at 07:28 +0530, Pranith Kumar Karampuri wrote: > >>> Franco, > >>> Thanks for providing the logs. I just copied over the logs to my > >>> machine. Most of the logs I see are related to "No such File or > >>> Directory" I wonder what lead to this. Do you have any idea? > >> No but I'm just looking at my 3.5 Gluster volume and it has a directory > >> that looks empty but can't be deleted. When I look at the directories on > >> the servers there are definitely files in there. > >> > >> [franco@charlie1 franco]$ rmdir /data2/franco/dir1226/dir25 > >> rmdir: failed to remove `/data2/franco/dir1226/dir25': Directory not empty > >> [franco@charlie1 franco]$ ls -la /data2/franco/dir1226/dir25 > >> total 8 > >> drwxrwxr-x 2 franco support 60 May 21 03:58 . > >> drwxrwxr-x 3 franco support 24 Jun 4 09:37 .. > >> > >> [root@nas6 ~]# ls -la /data*/gvol/franco/dir1226/dir25 > >> /data21/gvol/franco/dir1226/dir25: > >> total 2081 > >> drwxrwxr-x 13 1348 200 13 May 21 03:58 . > >> drwxrwxr-x 3 1348 200 3 May 21 03:58 .. > >> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13017 > >> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13018 > >> drwxrwxr-x 2 1348 200 3 May 16 12:05 dir13020 > >> drwxrwxr-x 2 1348 200 3 May 16 12:05 dir13021 > >> drwxrwxr-x 2 1348 200 3 May 16 12:05 dir13022 > >> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13024 > >> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13027 > >> drwxrwxr-x 2 1348 200 3 May 16 12:05 dir13028 > >> drwxrwxr-x 2 1348 200 2 May 16 12:06 dir13029 > >> drwxrwxr-x 2 1348 200 2 May 16 12:06 dir13031 > >> drwxrwxr-x 2 1348 200 3 May 16 12:06 dir13032 > >> > >> /data22/gvol/franco/dir1226/dir25: > >> total 2084 > >> drwxrwxr-x 13 1348 200 13 May 21 03:58 . > >> drwxrwxr-x 3 1348 200 3 May 21 03:58 .. > >> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13017 > >> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13018 > >> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13020 > >> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13021 > >> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13022 > >> ..... > >> > >> Maybe Gluster is losing track of the files?? > >> > >>> Pranith > >>> > >>> On 06/02/2014 02:48 PM, Franco Broi wrote: > >>>> Hi Pranith > >>>> > >>>> Here's a listing of the brick logs, looks very odd especially the size > >>>> of the log for data10. > >>>> > >>>> [root@nas3 bricks]# ls -ltrh > >>>> total 2.6G > >>>> -rw------- 1 root root 381K May 13 12:15 data12-gvol.log-20140511 > >>>> -rw------- 1 root root 430M May 13 12:15 data11-gvol.log-20140511 > >>>> -rw------- 1 root root 328K May 13 12:15 data9-gvol.log-20140511 > >>>> -rw------- 1 root root 2.0M May 13 12:15 data10-gvol.log-20140511 > >>>> -rw------- 1 root root 0 May 18 03:43 data10-gvol.log-20140525 > >>>> -rw------- 1 root root 0 May 18 03:43 data11-gvol.log-20140525 > >>>> -rw------- 1 root root 0 May 18 03:43 data12-gvol.log-20140525 > >>>> -rw------- 1 root root 0 May 18 03:43 data9-gvol.log-20140525 > >>>> -rw------- 1 root root 0 May 25 03:19 data10-gvol.log-20140601 > >>>> -rw------- 1 root root 0 May 25 03:19 data11-gvol.log-20140601 > >>>> -rw------- 1 root root 0 May 25 03:19 data9-gvol.log-20140601 > >>>> -rw------- 1 root root 98M May 26 03:04 data12-gvol.log-20140518 > >>>> -rw------- 1 root root 0 Jun 1 03:37 data10-gvol.log > >>>> -rw------- 1 root root 0 Jun 1 03:37 data11-gvol.log > >>>> -rw------- 1 root root 0 Jun 1 03:37 data12-gvol.log > >>>> -rw------- 1 root root 0 Jun 1 03:37 data9-gvol.log > >>>> -rw------- 1 root root 1.8G Jun 2 16:35 data10-gvol.log-20140518 > >>>> -rw------- 1 root root 279M Jun 2 16:35 data9-gvol.log-20140518 > >>>> -rw------- 1 root root 328K Jun 2 16:35 data12-gvol.log-20140601 > >>>> -rw------- 1 root root 8.3M Jun 2 16:35 data11-gvol.log-20140518 > >>>> > >>>> Too big to post everything. > >>>> > >>>> Cheers, > >>>> > >>>> On Sun, 2014-06-01 at 22:00 -0400, Pranith Kumar Karampuri wrote: > >>>>> ----- Original Message ----- > >>>>>> From: "Pranith Kumar Karampuri" <[email protected]> > >>>>>> To: "Franco Broi" <[email protected]> > >>>>>> Cc: [email protected] > >>>>>> Sent: Monday, June 2, 2014 7:01:34 AM > >>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning > >>>>>> > >>>>>> > >>>>>> > >>>>>> ----- Original Message ----- > >>>>>>> From: "Franco Broi" <[email protected]> > >>>>>>> To: "Pranith Kumar Karampuri" <[email protected]> > >>>>>>> Cc: [email protected] > >>>>>>> Sent: Sunday, June 1, 2014 10:53:51 AM > >>>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning > >>>>>>> > >>>>>>> > >>>>>>> The volume is almost completely idle now and the CPU for the brick > >>>>>>> process has returned to normal. I've included the profile and I think > >>>>>>> it > >>>>>>> shows the latency for the bad brick (data12) is unusually high, > >>>>>>> probably > >>>>>>> indicating the filesystem is at fault after all?? > >>>>>> I am not sure if we can believe the outputs now that you say the brick > >>>>>> returned to normal. Next time it is acting up, do the same procedure > >>>>>> and > >>>>>> post the result. > >>>>> On second thought may be its not a bad idea to inspect the log files of > >>>>> the bricks in nas3. Could you post them. > >>>>> > >>>>> Pranith > >>>>> > >>>>>> Pranith > >>>>>>> On Sun, 2014-06-01 at 01:01 -0400, Pranith Kumar Karampuri wrote: > >>>>>>>> Franco, > >>>>>>>> Could you do the following to get more information: > >>>>>>>> > >>>>>>>> "gluster volume profile <volname> start" > >>>>>>>> > >>>>>>>> Wait for some time, this will start gathering what operations are > >>>>>>>> coming > >>>>>>>> to > >>>>>>>> all the bricks" > >>>>>>>> Now execute "gluster volume profile <volname> info" > > >>>>>>>> /file/you/should/reply/to/this/mail/with > >>>>>>>> > >>>>>>>> Then execute: > >>>>>>>> gluster volume profile <volname> stop > >>>>>>>> > >>>>>>>> Lets see if this throws any light on the problem at hand > >>>>>>>> > >>>>>>>> Pranith > >>>>>>>> ----- Original Message ----- > >>>>>>>>> From: "Franco Broi" <[email protected]> > >>>>>>>>> To: [email protected] > >>>>>>>>> Sent: Sunday, June 1, 2014 9:02:48 AM > >>>>>>>>> Subject: [Gluster-users] glusterfsd process spinning > >>>>>>>>> > >>>>>>>>> Hi > >>>>>>>>> > >>>>>>>>> I've been suffering from continual problems with my gluster > >>>>>>>>> filesystem > >>>>>>>>> slowing down due to what I thought was congestion on a single brick > >>>>>>>>> being caused by a problem with the underlying filesystem running > >>>>>>>>> slow > >>>>>>>>> but I've just noticed that the glusterfsd process for that > >>>>>>>>> particular > >>>>>>>>> brick is running at 100%+, even when the filesystem is almost idle. > >>>>>>>>> > >>>>>>>>> I've done a couple of straces of the brick and another on the same > >>>>>>>>> server, does the high number of futex errors give any clues as to > >>>>>>>>> what > >>>>>>>>> might be wrong? > >>>>>>>>> > >>>>>>>>> % time seconds usecs/call calls errors syscall > >>>>>>>>> ------ ----------- ----------- --------- --------- ---------------- > >>>>>>>>> 45.58 0.027554 0 191665 20772 futex > >>>>>>>>> 28.26 0.017084 0 137133 readv > >>>>>>>>> 26.04 0.015743 0 66259 epoll_wait > >>>>>>>>> 0.13 0.000077 3 23 writev > >>>>>>>>> 0.00 0.000000 0 1 epoll_ctl > >>>>>>>>> ------ ----------- ----------- --------- --------- ---------------- > >>>>>>>>> 100.00 0.060458 395081 20772 total > >>>>>>>>> > >>>>>>>>> % time seconds usecs/call calls errors syscall > >>>>>>>>> ------ ----------- ----------- --------- --------- ---------------- > >>>>>>>>> 99.25 0.334020 133 2516 epoll_wait > >>>>>>>>> 0.40 0.001347 0 4090 26 futex > >>>>>>>>> 0.35 0.001192 0 5064 readv > >>>>>>>>> 0.00 0.000000 0 20 writev > >>>>>>>>> ------ ----------- ----------- --------- --------- ---------------- > >>>>>>>>> 100.00 0.336559 11690 26 total > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> Gluster-users mailing list > >>>>>>>>> [email protected] > >>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users > >>>>>>>>> > _______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
