Hi,

from Joe's findings it seems the IO subsystem has little impact on performance. If a ramdisk doesn't help with small files and directory listings the problem is probably gluster itself.

That being said IMO this is a general limitation in the design. There is no central lock manager and last time I checked gluster had to stat() on each brick to find the most recent entry. FUSE overhead and context switches make things worse, but to what extent and which part is responsible for high CPU/high latency is still unknown. AFAIK noone has measured in detail or the findings where such that internal policy is not to publish the results.

IIRC there was/is a stat cache translator but that is just a workaround and will lead to stale data.

cheers
 Paul

Am 06.11.2012 10:35, schrieb Fernando Frediani (Qube):
Joe,

I don't think we have to accept this as this is not acceptable thing. I have 
seen countless people complaining about this problem for a while and seems no 
improvements have been done.
The thing about the ramdisk although might help, looks more a chewing gun. I 
have seen other distributed filesystems that don't suffer for the same problem, 
so why Gluster have to ?

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Joe Landman
Sent: 05 November 2012 15:07
To: [email protected]
Subject: Re: [Gluster-users] Very slow directory listing and high CPU usage on 
replicated volume

On 11/05/2012 09:57 AM, harry mangalam wrote:
Jeff Darcy wrote a nice piece in his hekafs blog about 'the importance
of keeping things sequential' which is essentially about the
contention for heads between data io and journal io.
<http://hekafs.org/index.php/2012/11/the-importance-of-staying-sequent
ial/> (also congrats on the Linux Journal article on the glupy
python/gluster approach).

We've been experimenting with SSDs on ZFS (using the SSDs fo the ZIL
(journal)) and while it's provided a little bit of a boost, it has not
been dramatic.  Ditto XFS.  However, we did not stress it at all with
heavy loads

An issue you have to worry about is if the SSD streaming read/write path is 
around the same speed as the spinning rust performance.  If so, this design 
would be a wash at best.

Also, if this is under Linux, the ZFS pathways may not be terribly well 
optimized.

in a gluster env and I'm now thinking that there is where you would
see the improvement. (see Jeff's graph about how the diff in
threads/load affects IOPS).

Is anyone running a gluster system with the underlying XFS writing the
journal to SSDs?  If so, any improvement?  I would have expected to
hear about this as a recommended architecture for gluster if it had
performed MUCH better, but

Yes, we've done this, and do this on occasion.  No, there's no dramatic speed 
boost for most use cases.

Unfortunately, heavy metadata ops on GlusterFS are going to be slow, and we 
simply have to accept that for the near term.  This appears to be independent 
of the particular file system, or even storage technology.
If you aren't doing metadata heavy ops, then you should be in good shape.  It 
appears that mirroring magnifies the metadata heavy ops significantly.

For laughs, about a year ago, we set up large ram disks (tmpfs) in a cluster, put a 
loopback device on them, then a file system, then GlusterFS atop this.  Should have been 
very fast for metadata ops.  But it wasn't.  Gave some improvement, but not significant 
enough that we'd recommend doing "heroic" designs like this.

If your workloads are metadata heavy, we'd recommend local IO, and if you are 
mostly small IO, an SSD.






--
https://ssl.facebook.com/help/contact.php?show_form=delete_account
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to