Hi, This is about the release of Plasma-0.4, an alternate and independent implementation of map/reduce with its own dfs. This might also be interesting for Hadoop users and developers, because this project incorporates a number of new ideas. So far, Plasma has proven to work on smaller clusters and shows good signs of being scalable. The design of PlasmaFS is certainly superior to that of HDFS - I did not want a quick'n'dirty solution, so please have a look how to do it right.
Concerning the features, these two pages compare Plasma and Hadoop: http://plasma.camlcity.org/plasma/dl/plasma-0.4/doc/html/Plasmafs_and_hdfs.html http://plasma.camlcity.org/plasma/dl/plasma-0.4/doc/html/Plasmamr_and_hadoop.html I hope you see where the point is. I have currently only limited resources for testing my implementation. If there is anybody interested in testing on bigger clusters, please let me know. -- Plasma consists of two parts (for now), namely Plasma MapReduce, a map/reduce compute framework, and PlasmaFS, the underlying distributed filesystem. Plasma MapReduce is a distributed implementation of the map/reduce algorithm scheme written in Ocaml. PlasmaFS is the underlying distributed filesystem, also written in Ocaml. Especially the PlasmaFS approach has numerous differences compared to HDFS: * Data blocks are preallocated, and PlasmaFS takes care of block placement * Blocklists are extent-based * Metadata is stored in a PostgreSQL db (you need an SSD for getting good performance, however) * 2-phase commit is used to distribute the metadata db * the full set of file access functions is supported, including random writes * file accesses can be transaction-based * shared memory can be used for speeding up the data path to locally stored data blocks * we _think_ it is not possible to corrupt the namenode by accident or by crashes * PlasmaFS volumes can be directly mounted via NFS (we support full POSIX semantics, including random writes) * There are symlinks. * PlasmaFS uses ONCRPC as protocol and not home-grown protocols. A security module is available. * We got rid of multi-threading There is no need that user programs are written in Ocaml, as files are accessible via NFS, and Plasma also supports a streaming mode. (But yes, it is nice to program map/reduce in a functional programming language!) Both pieces of software are bundled together in one download. The project page with further links is http://projects.camlcity.org/projects/plasma.html There is now also a homepage at http://plasma.camlcity.org This is an early alpha release (0.4). A lot of things work already, and you can already run distributed map/reduce jobs. However, it is in no way complete. Plasma is installable via GODI for Ocaml 3.12. For discussions on specifics of Plasma there is a separate mailing list: https://godirepo.camlcity.org/mailman/listinfo/plasma-list Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany [email protected] Creator of GODI and camlcity.org. Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you. ------------------------------------------------------------
