I think your ideas here are useful, and it is sad the community mostly ignored them.
Extents are powerful and break the HDFS flaw that requires one block == one file on a node => contiguous regoins on a node are one block in size. Functional data structures (a.k.a. immutable or persistent ones) more easily translate (much less lines of code) to disk-backed structures for logging state changes. Features such as append, random write, and truncation are significantly simpler to implement with the design here. -Scott On 10/12/11 9:31 AM, "Gerd Stolpmann" <[email protected]> wrote: >Hi, > >This is about the release of Plasma-0.4, an alternate and independent >implementation of map/reduce with its own dfs. This might also be >interesting for Hadoop users and developers, because this project >incorporates a number of new ideas. So far, Plasma has proven to work on >smaller clusters and shows good signs of being scalable. The design of >PlasmaFS is certainly superior to that of HDFS - I did not want a >quick'n'dirty solution, so please have a look how to do it right. > >Concerning the features, these two pages compare Plasma and Hadoop: > >http://plasma.camlcity.org/plasma/dl/plasma-0.4/doc/html/Plasmafs_and_hdfs >.html > >http://plasma.camlcity.org/plasma/dl/plasma-0.4/doc/html/Plasmamr_and_hado >op.html > >I hope you see where the point is. > >I have currently only limited resources for testing my implementation. >If there is anybody interested in testing on bigger clusters, please let >me know. > >-- > >Plasma consists of two parts (for now), namely Plasma MapReduce, a >map/reduce compute framework, and PlasmaFS, the underlying distributed >filesystem. > >Plasma MapReduce is a distributed implementation of the map/reduce >algorithm scheme written in Ocaml. PlasmaFS is the underlying >distributed filesystem, also written in Ocaml. Especially the PlasmaFS >approach has numerous differences compared to HDFS: > > * Data blocks are preallocated, and PlasmaFS takes care of block > placement > * Blocklists are extent-based > * Metadata is stored in a PostgreSQL db (you need an SSD for > getting good performance, however) > * 2-phase commit is used to distribute the metadata db > * the full set of file access functions is supported, including > random writes > * file accesses can be transaction-based > * shared memory can be used for speeding up the data path to > locally stored data blocks > * we _think_ it is not possible to corrupt the namenode by > accident or by crashes > * PlasmaFS volumes can be directly mounted via NFS (we support > full POSIX semantics, including random writes) > * There are symlinks. > * PlasmaFS uses ONCRPC as protocol and not home-grown protocols. > A security module is available. > * We got rid of multi-threading > >There is no need that user programs are written in Ocaml, as files are >accessible via NFS, and Plasma also supports a streaming mode. (But yes, >it is nice to program map/reduce in a functional programming language!) > >Both pieces of software are bundled together in one download. The >project page with further links is > >http://projects.camlcity.org/projects/plasma.html > >There is now also a homepage at > >http://plasma.camlcity.org > >This is an early alpha release (0.4). A lot of things work already, and >you can already run distributed map/reduce jobs. However, it is in no >way complete. > >Plasma is installable via GODI for Ocaml 3.12. > >For discussions on specifics of Plasma there is a separate mailing list: > >https://godirepo.camlcity.org/mailman/listinfo/plasma-list > >Gerd >-- >------------------------------------------------------------ >Gerd Stolpmann, Darmstadt, Germany [email protected] >Creator of GODI and camlcity.org. >Contact details: http://www.camlcity.org/contact.html >Company homepage: http://www.gerd-stolpmann.de >*** Searching for new projects! Need consulting for system >*** programming in Ocaml? Gerd Stolpmann can help you. >------------------------------------------------------------ >
