Re: [Gluster-users] Gluster 2.0.2 locking up issues

Daniel Jordan Bambach Thu, 18 Jun 2009 05:20:19 -0700

Well one of the servers just locked up again (completely).

All accesses were occurring on the other machine at the time, We had amoment when a directory on the still running server went to 'Device orResource Busy', I restartedt Gluster on that machine to clear theissue, then noticed the second had died (not sure if it happened atthe same time or not)

I'm trying to update the dump_caches value to 3, but it isn't lettingme for some reason (permission denied as root ?)

Will adding DEBUG to the glusterfs commandline give me moreinformation across the whole process rather than the trace (below)which isnt giving anything away?

[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 162: (loc{path=/www/site/rebuild2008/faber, ino=0})[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 162:(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37883182, st_mode=40775,st_nlink=24, st_uid=504, st_gid=501, st_rdev=0, st_size=4096,st_blksize=4096, st_blocks=16})[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 163: (loc{path=/www/site/rebuild2008/faber/site-media, ino=0})[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 163:(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238048, st_mode=40777,st_nlink=21, st_uid=504, st_gid=501, st_rdev=0, st_size=4096,st_blksize=4096, st_blocks=16})[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 164: (loc{path=/www/site/rebuild2008/faber/site-media/onix-images, ino=0})[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 164:(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37884374, st_mode=40777,st_nlink=4, st_uid=504, st_gid=501, st_rdev=0, st_size=114688,st_blksize=4096, st_blocks=240})[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 165: (loc{path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs, ino=0})[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 165:(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238105, st_mode=40777,st_nlink=3, st_uid=504, st_gid=501, st_rdev=0, st_size=479232,st_blksize=4096, st_blocks=952})[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 166: (loc{path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs/185_jpg_130x400_q85.jpg, ino=0})[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 166:(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=7089866, st_mode=100644,st_nlink=1, st_uid=504, st_gid=501, st_rdev=0, st_size=10919,st_blksize=4096, st_blocks=32})

---ends--


On 18 Jun 2009, at 11:53, Daniel Jordan Bambach wrote:

Willdo, though I recently added in those lines to help be explicitabout behaviour (I had no options set before at all, leaving it tothe default of 16 threads). I will remove and specify the default of16 to see if that helps.
Im adding:

volume trace
 type debug/trace
 subvolumes cache
end-volume
to both sides now as well, so next time (if any) it locks up perhapsthere will be some more info.
Thanks Shehjar


On 18 Jun 2009, at 11:26, Shehjar Tikoo wrote:
Daniel Jordan Bambach wrote:
I'm experiencing various locking up issues ranging from Glusterlocking up ( 'ls'ing the mount hangs ), to the whole machinelocking up under load.
My current config is below (two servers, afring)
I would love to be able to get to the bottom of this, because itseems very strange that we should see erratic behaviour on such asimple setup.There is approx 12Gb of files, and to stress test (and heal) i runls -alR on the mount. This will run for a while and eventuallylock up Gluster, and occasionally the machine. I have found thatin some cases killing Gluster and re-mounting does not solve theproblem (in that perhaps both servers have entered a locked statein some way).Im finding it very hard to collect and debug information of anyuse, as there is no crashlog, no errors in the volume log.Can anyone suggest what I migth be able to do to extract moreinformation as to what is occuring at lock-up time?
volume posix
type storage/posix
option directory /home/export
end-volume
volume locks
type features/locks
subvolumes posix
end-volume
volume brick
type performance/io-threads
subvolumes locks
option autoscaling on
option min-threads 8
option max-threads 32
end-volume
I see that the max-threads will never exceed 32 which is
a reasonable valueand should work fine in most cases but considering
some of the other reports we've been getting, could you please tryagain
but without the autoscaling turned on?

It is off by default, so you can simply set the number of threads
you need by:

option thread-count <COUNT>

...instead of the three "option" lines above.

Thanks
Shehjar
volume server
type protocol/server
option transport-type tcp
option auth.addr.brick.allow *
subvolumes brick
end-volume
volume latsrv2
type protocol/client
option transport-type tcp
option remote-host latsrv2
option remote-subvolume brick
end-volume
volume afr
type cluster/replicate
subvolumes brick latsrv2
option read-subvolume brick
end-volume
volume writebehind
type performance/write-behind
option cache-size 2MB
subvolumes afr
end-volume
volume cache
type performance/io-cache
option cache-size 32MB
option priority *.pyc:4,*.html:3,*.php:2,*:1
option cache-timeout 5
subvolumes writebehind
end-volume
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 2.0.2 locking up issues

Reply via email to