On 09/29/2011 01:44 PM, David Miller wrote:
On Thu, Sep 29, 2011 at 1:32 PM, David Miller <[email protected]
<mailto:[email protected]>> wrote:

    Couldn't you  accomplish the same thing with flashcache?
    https://github.com/facebook/flashcache/


I should expand on that a little bit.  Flashcache is a kernel module
created by Facebook that uses the device mapper interface in Linux to
provide a ssd cache layer to any block device.

What I think would be interesting is using flashcache with a pcie ssd as
the caching device.  That would add about $500-$600 to the cost of each
brick node but should be able to buffer the active IO from the spinning
media pretty well.

Erp ... low end PCIe flash with decent performance start much higher than 500-600 $ USD.

Somthing like this.
http://www.amazon.com/OCZ-Technology-Drive-240GB-Express/dp/B0058RECUE
or something from FusionIO if you want something that's aimed more at
the enterprise.

Flashcache is reasonably good, but there are many variables in using it, and its designed for a different use case. For most people the writeback may be reasonable, but other use cases would require different configs.

This said, please understand that it (and L2ARC, and other similar things) are *not* silver bullets (e.g. not magical things that will instantly make something far better, at no cost/effort). They do introduce additional complexity, and additional tuning points.

The thing you cannot get rid of, the network traversal, is implicated in much of the performance degradation for small files. Putting the file system on a RAM disk (if possible, tmpfs doesn't support xattrs), wouldn't make the system much faster for small files. Eliminating the network traversal and doing local distributed caching of metadata on the client side ... could ... but this would be a huge new complication, and I'd argue that it probably isn't worth it.

For the short duration, small file performance is going to be bad. You might be able to play some games to make this performance better (L2ARC etc. could help in some aspects, but they won't be universally much better).

What matters most is very good design on the storage backend (we are biased due to what it is we sell/support), very good networking, and very good gluster implementation/tuning. Its real easy to hit very slow performance by missing critical elements. We field many inquiries which usually start out with "we built our own and the performance isn't that good." You won't get good performance on the cluster file system if the underlying file system and storage design isn't going to give it to you in the first place.

This said, please understand that there is a (significant) performance cost to all those nice features in ZFS. And there is a reason why its not generally considered a high performance file system. So if you start building with it, you shouldn't necessarily think that the whole is going to be faster than the sum of the parts. Might be worse.

This is a caution from someone who has tested/shipped many different file systems in the past. ZFS included, on Solaris and other machines. There is a very significant performance penalty one pays for using some of these features. You have to decide if this penalty is worth it.


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: [email protected]
web  : http://scalableinformatics.com
       http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to