Am Donnerstag, den 13.10.2011, 13:48 +0100 schrieb Steve Loughran: > On 12/10/11 17:31, Gerd Stolpmann wrote: > > Hi, > > > > This is about the release of Plasma-0.4, an alternate and independent > > implementation of map/reduce with its own dfs. This might also be > > interesting for Hadoop users and developers, because this project > > incorporates a number of new ideas. So far, Plasma has proven to work on > > smaller clusters and shows good signs of being scalable. The design of > > PlasmaFS is certainly superior to that of HDFS - I did not want a > > quick'n'dirty solution, so please have a look how to do it right. > > > > Concerning the features, these two pages compare Plasma and Hadoop: > > > > http://plasma.camlcity.org/plasma/dl/plasma-0.4/doc/html/Plasmafs_and_hdfs.html > > > > - without block checksums your code contains assumptions about HDD > integrity that does not stand up to the classic works by Pinhero or > Schroeder. Essentially you appear to be assuming that HDDs don't corrupt > data, yet both HDD and their interconnects can play up. For a recent > summary of Hadoop integrity, I would point you at [Loughran2011] > > http://www.slideshare.net/steve_l/did-you-reallywantthatdata
Interesting points. Checksums are now on my TODO list. > -Hadoop NNs benefit from SSD too. Well, "benefit" was here meant as "massive benefit". Plasma syncs after each commit. Actually, being dependent on SSD technology is a bad point, although manageable. > -auth and security has improved recently, though I'd still run it in a > private subnet just to be sure > > > > > http://plasma.camlcity.org/plasma/dl/plasma-0.4/doc/html/Plasmamr_and_hadoop.html > > > > I hope you see where the point is. > > Again, support for small block size is relevant in small situations. In > larger clusters you will not only have larger block sizes, if you do > work on small blocks the sheer number of task trackers reporting back to > the JT can overload it. This is not the whole story. I have recently written an article about this point: http://blog.camlcity.org/blog/plasma4.html Of course, if you want to support smaller sized blocks, you have to change other things, too. For example, PlasmaFS supports block list compression. Also, Plasma's map/reduce creates way fewer tasks than Hadoop's, and contacts the namenode less frequently. That's finally what I meant with "support" - you have to somehow compensate the additional overhead induced by it. > > I have currently only limited resources for testing my implementation. > > If there is anybody interested in testing on bigger clusters, please let > > me know. > > That's one of the issues with the Plasma design: I'm not sure how well > things like Posix semantics, esp. locking and writes with offsets scale. > That's why the very large filesystems, HDFS included, tend to drop them. > Look at how much effort it took to get Append to work reliably. You know, I'm a functional programmer, and this really helped here getting it right. E.g. for offset writes there are three important conceptual things: - Offset writes can be implemented by allocating replacement blocks (i.e. we do a copy instead of mutating existing blocks). Very FP-ish: Create a new version of the same instead of overwriting. - It's then only a matter of supporting the right data structures. Plasma uses a specially casted FP-style immutable data type for representing block lists. It's highly efficient, and provably correct. - Finally, these complex data structures must be made persistent. Plasma builds upon transactions that are isolated against each other. This also very FP-ish: we just write a new version, and replace the old one atomically. I don't think there is any scalability issue (maybe except that you then really need complex things like transactions, and these are no longer luxury). > Without evidence of working at scale, I'm not sure how the claim "the > design of Plasma is certainly superior to HDFS" is defensible. Sorry. Well, I cannot prove right now how much all this scales. I agree this is important, but I am currently not able to jump over this barrier (lack of hardware). All what I can do is to test with small clusters (the largest was 4 machines so far), and to draw the conclusions. I'm quite sure these tests indicates it would also work with 40 machines (by simulating more clients), but I cannot say where the limit is where you see a non-linear slowdown. Also, let me say that I do not agree with the assumption that it is only scalability that counts. Quality has many dimensions. How useful is a highly scalable system when it does not support the features you need? You see I'm focusing on a feature set that is recognizably different from HDFS's. > That said, using SunOS RPC/NFS as an FS protocol is nice as it does make > mounting straightforward. And as NFS locking isn't guaranteed in NFS, > you may be able to get away without it. NFS is here only a secondary protocol - the primary PlasmaFS protocol has more features like SQL-style transactions. But of course it has some similarity with NFS, and supports all the features you finally need for NFS. Adding locking (lockf) wouldn't be so difficult as such, but the challenge is more to make it fast (locks need to be persistent in a DFS). The simple locking method basing on exclusive file creation works already. Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany [email protected] Creator of GODI and camlcity.org. Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you. ------------------------------------------------------------
