(added hail-devel to CC, with permission)



Rick Peralta wrote:
Hi Jeff,

Is there someone taking the point for the chunkd development?

That's me, for the moment.  :)

In general, since Pete and I are Linux kernel developers -- one of the largest open source projects in the world -- I think importing the tried-and-true kernel requirements for code maintainership makes sense:

Once you are intimately familiar with a codebase, and can answers others' questions about design and code, you have reached the level where you could be a maintainer.

But more importantly -- that is not important!

Just contribute code, because that is the why open source projects advance in a particular direction anyway. Linus Torvalds does not take point of the Linux kernel -- the people who actually write code do that. Linus writes maybe 0.01% of the code these days. The people who code, make the roadmap.


To get to 10 Gbe there may be a variety of problems:
1) If there are data copies, the multiplicative effect can saturate the memory system. 2) If the primary storage is rotating media (disk), it wold take 40+ devices to keep up (assuming large stripes). 3) There are artifacts of the VM system that show up at high bandwidth, especially if there is a lot of RAM. 4) The transaction frequency can become problematic, depending on the application.
5) Running a single network transport at 10 Gbe can be challenging.

Indeed, scaling up the networking and storage can have plenty of implications.

We saw some of this when I was first working with NIC hardware manufacturers to add the first 10 Gbe NIC drivers to the Linux kernel.

chunkd is intentionally message-based, which implies that non-TCP protocols could readily be bolted on, for use in data centers with 10 Gbe networks (AMQP? RDMA?).


Is there an application profile that might be used as a performance metric? Something like total volume size (all media), total number of chunks, distribution of chunk sizes, aggregate bandwidth, et cetera.

This is unfortunately going to vary wildly depending on the application using chunkd (and the application using that, in turn).

To take tabled as an example, and assuming a "standard cloud node" hardware setup,

* total single-node volume size:  one cheap SATA hard drive
* total number of chunks:  ==
        total number of tabled objects / number of storage nodes
* distribution of chunk sizes:  dependent upon the application using tabled
* aggregate bandwidth:  dependent upon the application using tabled


If it matters I investigated building something very much like chunkd a while back and has some stringent performance criteria. It was not clear what general application demand there is. Is there a resource to get a sense of where there is real need?

You should read the GoogleFS paper referenced on the chunkd wiki page: http://labs.google.com/papers/gfs-sosp2003.pdf It describes the purpose and use of a chunk server, in the context of distributed cloud storage.

The demand is largely internal -- other Project Hail projects and outside distributed storage application should use chunkd in the creation of their own cloud-based service.

The intent is for tabled, nfs4d, and other distributed-storage projects to communicate with multiple chunkd's on multiple nodes, to accomplish replicated, highly available distributed storage.

If there is some specialized use of chunkd that you have in mind, we're interested in hearing that, too... I certainly want to enable as many applications as possible with these projects.

Regards,

        Jeff



--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to