Dear Brian and Paul,
Thanks for reporting your NFS performance degradation problems; I'm glad I'm not the only one who has it. My 20 node storage cluster has a number of fairly standard replicated-distributed volumes; I don't use striping.
I've also been considering writing a cronjob
to fix this - have you made any progress on this, anything to report?
I made my compute cluster nodes part of the storage cluster a couple of months ago as described here:

http://community.gluster.org/a/nfs-performance-with-fuse-client-redundancy/

A few days ago I set up a cron job to restart glusterd on the compute nodes every day at about 2AM. So far there haven't been any reported problems and long running jobs have been unaffected. I thought this would be potentially less disruptive than automatically restarting glusterd on the storage servers, because those do a lot more than just provide NFS. I have been using the GlusterFS servers to export NFS to less important machines, but I now plan to use the compute nodes for all NFS exports in order to take advantage of the daily glusterd restart. This isn't an ideal situation because the compute nodes get very busy at times and tend to suffer more down time than the storage servers. I thought about having a dedicated compute server just for GlusterFS exports, but I don't have enough in the budget for that at the moment. My other worry is that other GlusterFS related processes on the storage servers will slow down with use, not just NFS.

What sort of tasks are you using your gluster for?
The compute cluster is mainly used to run various climate and meteorology related models and associated data analysis and processing applications, all reading from and writing to GlusterFS volumes.
Ours is for a
render farm, so we see a very large number of mounts/unmounts as render
nodes mount various parts of the filesystem. I wonder if this has anything
to do with it; is your use case anything similar?
I don't think our models and applications do a lot of mounting and unmounting; volumes usually stay mounted while compute cluster jobs are using the data, and there are also quite a lot of interactive shells keeping volumes mounted for long periods.

-Dan.

On 04/23/2012 08:00 PM, [email protected] wrote:
Date: Mon, 23 Apr 2012 19:24:14 +0100
From: Paul Simpson<[email protected]>
Subject: Re: [Gluster-users] Frequent glusterd restarts needed to
        avoid NFS performance degradation
To: Brian Cipriano<[email protected]>
Cc: [email protected]
Message-ID:
        <CAOFxjOTGSS3mFve=EktgAZRaQz3XiZLoZU-EvEByCV6H=m1...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

just like to add that we sometimes need to restart glusterd on servers too.
  again - on a renderfarm that hammers our 4 server dist/repl servers
heavily.

-p


On 23 April 2012 15:38, Brian Cipriano<[email protected]>  wrote:
Hi Dan - I've seen this problem too. I agree with everything you've
described - seems to happen more quickly on more heavily used volumes, and
a restart fixes it right away. I've also been considering writing a cronjob
to fix this - have you made any progress on this, anything to report?

I'm running a fairly simple distributed, non-replicated volume across two
servers. What sort of tasks are you using your gluster for? Ours is for a
render farm, so we see a very large number of mounts/unmounts as render
nodes mount various parts of the filesystem. I wonder if this has anything
to do with it; is your use case anything similar?

- brian


On 4/17/12 7:30 PM, Dan Bretherton wrote:
Dear All-
I find that I have to restart glusterd every few days on my servers to
stop NFS performance from becoming unbearably slow.  When the problem
occurs, volumes can take several minutes to mount and there are long delays
responding to "ls".   Mounting from a different server, i.e. one not
normally used for NFS export, results in normal NFS access speeds.  This
doesn't seem to have anything to do with load because it happens whether or
not there is anything running on the compute servers.  Even when the system
is mostly idle there are often a lot of glusterfsd processes running, and
on several of the servers I looked at this evening there is a process
called glusterfs using 100% of one CPU.  I can't find anything unusual in
nfs.log or etc-glusterfs-glusterd.vol.log on the servers affected.
  Restarting glusterd seems to stop this strange behaviour and make NFS
access run smoothly again, but this usually only lasts for a day or two.

This behaviour is not necessarily related to the length of time since
glusterd was started, but has more to do with the amount of work the
GlusterFS processes on each server have to do.  I use a different server to
export each of my 8 different volumes, and the NFS performance degradation
seems to affect the most heavily used volumes more than the others.  I
really need to find a solution to this problem; all I can think of doing is
setting up a cron job on each server to restart glusterd every day, but I
am worried about what side effects that might have.  I am using GlusterFS
version 3.2.5.  All suggestions would be much appreciated.

Regards,
Dan.
______________________________**_________________
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to