Hi, Greg has spent some time fixing up a Hadoop FileSystem module allowing a hadoop cluster to use Ceph in place of HDFS. It hasn't seen extensive testing or benchmarking (we don't use hadoop internally), but it passes our basic tests and seems to have similar performance to HDFS.
The main reason Hadoop users might be interested is the scaling problems people are having with HDFS's namenode. Ceph's MDS maintains minimal per-inode metadata (no block lists), doesn't require that it all be in memory, and (perhaps most importantly) has a clustered MDS architecture, allowing metadata to be spread across tens or possibly hundreds of nodes. Anyway, we're very much interested in seeing Ceph perform well for Hadoop. The Hadoop module can be found in src/client/hadoop, and has been submitted for inclusion in the next Hadoop release. It relies on libceph, which can be built and installed from source, or as a .deb. sage ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Ceph-devel mailing list Ceph-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ceph-devel