Re: [Gluster-users] glusterfsd process spinning

Susant Palai Tue, 03 Jun 2014 22:52:26 -0700

>From the logs it seems files are present on data(21,22,23,24) which are on 
>nas6 while missing on data(17,18,19,20) which are on nas5 (interesting). There 
>is an existing issue where directories does not show up on mount point if they 
>are not present on first_up_subvol(longest living brick) and the current issue 
>looks more similar. Well will look at the client logs for more information.


Susant.

----- Original Message -----
From: "Franco Broi" <[email protected]>
To: "Pranith Kumar Karampuri" <[email protected]>
Cc: "Susant Palai" <[email protected]>, [email protected], "Raghavendra 
Gowdappa" <[email protected]>, [email protected], [email protected], 
[email protected]
Sent: Wednesday, 4 June, 2014 10:32:37 AM
Subject: Re: [Gluster-users] glusterfsd process spinning

On Wed, 2014-06-04 at 10:19 +0530, Pranith Kumar Karampuri wrote: 
> On 06/04/2014 08:07 AM, Susant Palai wrote:
> > Pranith can you send the client and bricks logs.
> I have the logs. But I believe for this issue of directory not listing 
> entries, it would help more if we have the contents of that directory on 
> all the directories in the bricks + their hash values in the xattrs.

Strange thing is, all the invisible files are on the one server (nas6),
the other seems ok. I did rm -Rf of /data2/franco/dir* and was left with
this one directory - there were many hundreds which were removed
successfully.

I've attached listings and xattr dumps.

Cheers,

Volume Name: data2
Type: Distribute
Volume ID: d958423f-bd25-49f1-81f8-f12e4edc6823
Status: Started
Number of Bricks: 8
Transport-type: tcp
Bricks:
Brick1: nas5-10g:/data17/gvol
Brick2: nas5-10g:/data18/gvol
Brick3: nas5-10g:/data19/gvol
Brick4: nas5-10g:/data20/gvol
Brick5: nas6-10g:/data21/gvol
Brick6: nas6-10g:/data22/gvol
Brick7: nas6-10g:/data23/gvol
Brick8: nas6-10g:/data24/gvol
Options Reconfigured:
nfs.drc: on
cluster.min-free-disk: 5%
network.frame-timeout: 10800
nfs.export-volumes: on
nfs.disable: on
cluster.readdir-optimize: on

Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick nas5-10g:/data17/gvol                             49152   Y       6553
Brick nas5-10g:/data18/gvol                             49153   Y       6564
Brick nas5-10g:/data19/gvol                             49154   Y       6575
Brick nas5-10g:/data20/gvol                             49155   Y       6586
Brick nas6-10g:/data21/gvol                             49160   Y       20608
Brick nas6-10g:/data22/gvol                             49161   Y       20613
Brick nas6-10g:/data23/gvol                             49162   Y       20614
Brick nas6-10g:/data24/gvol                             49163   Y       20621
 
Task Status of Volume data2
------------------------------------------------------------------------------
There are no active volume tasks



> 
> Pranith
> >
> > Thanks,
> > Susant~
> >
> > ----- Original Message -----
> > From: "Pranith Kumar Karampuri" <[email protected]>
> > To: "Franco Broi" <[email protected]>
> > Cc: [email protected], "Raghavendra Gowdappa" 
> > <[email protected]>, [email protected], [email protected], 
> > [email protected], [email protected]
> > Sent: Wednesday, 4 June, 2014 7:53:41 AM
> > Subject: Re: [Gluster-users] glusterfsd process spinning
> >
> > hi Franco,
> >        CC Devs who work on DHT to comment.
> >
> > Pranith
> >
> > On 06/04/2014 07:39 AM, Franco Broi wrote:
> >> On Wed, 2014-06-04 at 07:28 +0530, Pranith Kumar Karampuri wrote:
> >>> Franco,
> >>>          Thanks for providing the logs. I just copied over the logs to my
> >>> machine. Most of the logs I see are related to "No such File or
> >>> Directory" I wonder what lead to this. Do you have any idea?
> >> No but I'm just looking at my 3.5 Gluster volume and it has a directory
> >> that looks empty but can't be deleted. When I look at the directories on
> >> the servers there are definitely files in there.
> >>
> >> [franco@charlie1 franco]$ rmdir /data2/franco/dir1226/dir25
> >> rmdir: failed to remove `/data2/franco/dir1226/dir25': Directory not empty
> >> [franco@charlie1 franco]$ ls -la  /data2/franco/dir1226/dir25
> >> total 8
> >> drwxrwxr-x 2 franco support 60 May 21 03:58 .
> >> drwxrwxr-x 3 franco support 24 Jun  4 09:37 ..
> >>
> >> [root@nas6 ~]# ls -la /data*/gvol/franco/dir1226/dir25
> >> /data21/gvol/franco/dir1226/dir25:
> >> total 2081
> >> drwxrwxr-x 13 1348 200 13 May 21 03:58 .
> >> drwxrwxr-x  3 1348 200  3 May 21 03:58 ..
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13017
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13018
> >> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13020
> >> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13021
> >> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13022
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13024
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13027
> >> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13028
> >> drwxrwxr-x  2 1348 200  2 May 16 12:06 dir13029
> >> drwxrwxr-x  2 1348 200  2 May 16 12:06 dir13031
> >> drwxrwxr-x  2 1348 200  3 May 16 12:06 dir13032
> >>
> >> /data22/gvol/franco/dir1226/dir25:
> >> total 2084
> >> drwxrwxr-x 13 1348 200 13 May 21 03:58 .
> >> drwxrwxr-x  3 1348 200  3 May 21 03:58 ..
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13017
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13018
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13020
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13021
> >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13022
> >> .....
> >>
> >> Maybe Gluster is losing track of the files??
> >>
> >>> Pranith
> >>>
> >>> On 06/02/2014 02:48 PM, Franco Broi wrote:
> >>>> Hi Pranith
> >>>>
> >>>> Here's a listing of the brick logs, looks very odd especially the size
> >>>> of the log for data10.
> >>>>
> >>>> [root@nas3 bricks]# ls -ltrh
> >>>> total 2.6G
> >>>> -rw------- 1 root root 381K May 13 12:15 data12-gvol.log-20140511
> >>>> -rw------- 1 root root 430M May 13 12:15 data11-gvol.log-20140511
> >>>> -rw------- 1 root root 328K May 13 12:15 data9-gvol.log-20140511
> >>>> -rw------- 1 root root 2.0M May 13 12:15 data10-gvol.log-20140511
> >>>> -rw------- 1 root root    0 May 18 03:43 data10-gvol.log-20140525
> >>>> -rw------- 1 root root    0 May 18 03:43 data11-gvol.log-20140525
> >>>> -rw------- 1 root root    0 May 18 03:43 data12-gvol.log-20140525
> >>>> -rw------- 1 root root    0 May 18 03:43 data9-gvol.log-20140525
> >>>> -rw------- 1 root root    0 May 25 03:19 data10-gvol.log-20140601
> >>>> -rw------- 1 root root    0 May 25 03:19 data11-gvol.log-20140601
> >>>> -rw------- 1 root root    0 May 25 03:19 data9-gvol.log-20140601
> >>>> -rw------- 1 root root  98M May 26 03:04 data12-gvol.log-20140518
> >>>> -rw------- 1 root root    0 Jun  1 03:37 data10-gvol.log
> >>>> -rw------- 1 root root    0 Jun  1 03:37 data11-gvol.log
> >>>> -rw------- 1 root root    0 Jun  1 03:37 data12-gvol.log
> >>>> -rw------- 1 root root    0 Jun  1 03:37 data9-gvol.log
> >>>> -rw------- 1 root root 1.8G Jun  2 16:35 data10-gvol.log-20140518
> >>>> -rw------- 1 root root 279M Jun  2 16:35 data9-gvol.log-20140518
> >>>> -rw------- 1 root root 328K Jun  2 16:35 data12-gvol.log-20140601
> >>>> -rw------- 1 root root 8.3M Jun  2 16:35 data11-gvol.log-20140518
> >>>>
> >>>> Too big to post everything.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> On Sun, 2014-06-01 at 22:00 -0400, Pranith Kumar Karampuri wrote:
> >>>>> ----- Original Message -----
> >>>>>> From: "Pranith Kumar Karampuri" <[email protected]>
> >>>>>> To: "Franco Broi" <[email protected]>
> >>>>>> Cc: [email protected]
> >>>>>> Sent: Monday, June 2, 2014 7:01:34 AM
> >>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ----- Original Message -----
> >>>>>>> From: "Franco Broi" <[email protected]>
> >>>>>>> To: "Pranith Kumar Karampuri" <[email protected]>
> >>>>>>> Cc: [email protected]
> >>>>>>> Sent: Sunday, June 1, 2014 10:53:51 AM
> >>>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning
> >>>>>>>
> >>>>>>>
> >>>>>>> The volume is almost completely idle now and the CPU for the brick
> >>>>>>> process has returned to normal. I've included the profile and I think 
> >>>>>>> it
> >>>>>>> shows the latency for the bad brick (data12) is unusually high, 
> >>>>>>> probably
> >>>>>>> indicating the filesystem is at fault after all??
> >>>>>> I am not sure if we can believe the outputs now that you say the brick
> >>>>>> returned to normal. Next time it is acting up, do the same procedure 
> >>>>>> and
> >>>>>> post the result.
> >>>>> On second thought may be its not a bad idea to inspect the log files of 
> >>>>> the bricks in nas3. Could you post them.
> >>>>>
> >>>>> Pranith
> >>>>>
> >>>>>> Pranith
> >>>>>>> On Sun, 2014-06-01 at 01:01 -0400, Pranith Kumar Karampuri wrote:
> >>>>>>>> Franco,
> >>>>>>>>        Could you do the following to get more information:
> >>>>>>>>
> >>>>>>>> "gluster volume profile <volname> start"
> >>>>>>>>
> >>>>>>>> Wait for some time, this will start gathering what operations are 
> >>>>>>>> coming
> >>>>>>>> to
> >>>>>>>> all the bricks"
> >>>>>>>> Now execute "gluster volume profile <volname> info" >
> >>>>>>>> /file/you/should/reply/to/this/mail/with
> >>>>>>>>
> >>>>>>>> Then execute:
> >>>>>>>> gluster volume profile <volname> stop
> >>>>>>>>
> >>>>>>>> Lets see if this throws any light on the problem at hand
> >>>>>>>>
> >>>>>>>> Pranith
> >>>>>>>> ----- Original Message -----
> >>>>>>>>> From: "Franco Broi" <[email protected]>
> >>>>>>>>> To: [email protected]
> >>>>>>>>> Sent: Sunday, June 1, 2014 9:02:48 AM
> >>>>>>>>> Subject: [Gluster-users] glusterfsd process spinning
> >>>>>>>>>
> >>>>>>>>> Hi
> >>>>>>>>>
> >>>>>>>>> I've been suffering from continual problems with my gluster 
> >>>>>>>>> filesystem
> >>>>>>>>> slowing down due to what I thought was congestion on a single brick
> >>>>>>>>> being caused by a problem with the underlying filesystem running 
> >>>>>>>>> slow
> >>>>>>>>> but I've just noticed that the glusterfsd process for that 
> >>>>>>>>> particular
> >>>>>>>>> brick is running at 100%+, even when the filesystem is almost idle.
> >>>>>>>>>
> >>>>>>>>> I've done a couple of straces of the brick and another on the same
> >>>>>>>>> server, does the high number of futex errors give any clues as to 
> >>>>>>>>> what
> >>>>>>>>> might be wrong?
> >>>>>>>>>
> >>>>>>>>> % time     seconds  usecs/call     calls    errors syscall
> >>>>>>>>> ------ ----------- ----------- --------- --------- ----------------
> >>>>>>>>> 45.58    0.027554           0    191665     20772 futex
> >>>>>>>>> 28.26    0.017084           0    137133           readv
> >>>>>>>>> 26.04    0.015743           0     66259           epoll_wait
> >>>>>>>>>      0.13    0.000077           3        23           writev
> >>>>>>>>>      0.00    0.000000           0         1           epoll_ctl
> >>>>>>>>> ------ ----------- ----------- --------- --------- ----------------
> >>>>>>>>> 100.00    0.060458                395081     20772 total
> >>>>>>>>>
> >>>>>>>>> % time     seconds  usecs/call     calls    errors syscall
> >>>>>>>>> ------ ----------- ----------- --------- --------- ----------------
> >>>>>>>>> 99.25    0.334020         133      2516           epoll_wait
> >>>>>>>>>      0.40    0.001347           0      4090        26 futex
> >>>>>>>>>      0.35    0.001192           0      5064           readv
> >>>>>>>>>      0.00    0.000000           0        20           writev
> >>>>>>>>> ------ ----------- ----------- --------- --------- ----------------
> >>>>>>>>> 100.00    0.336559                 11690        26 total
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Gluster-users mailing list
> >>>>>>>>> [email protected]
> >>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>>>>>>>>
> 

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfsd process spinning

Reply via email to