I think Gluster also supports large amounts of data- but as I understand it - Gluster nodes are meant to be "Bricks" that is they are only meant for Storage.
In Map-Reduce use - people talk about Map/Reduce jobs running near the storage- What does it mean? - They run on the same node that has the disks- so they are able to retrieve data fast. - They run closer to the node that has the data- this can reduce network traffic I think with declining cost of server and network switch ports, this should become less of an issue. I personally like a "Gluster" like architecture- storage bricks, striping files across multiple nodes and automatic self healing- I am assuming these features exist in all of the file systems- but Gluster seems to be low cost and professionally supported, as is Cloudera. Kevin On Wed, May 12, 2010 at 5:10 AM, Buttler, David <buttl...@llnl.gov> wrote: > If you are opening up the discussion to HDFS, I would really like to think > more deeply as to why HDFS is a better choice for some workloads than, say, > Luster or GPFS. > The things I like about HDFS over Luster is that > 1) it is easier to set up > 2) HDFS by default has local storage (as opposed to storage attached > networks which is more typical for Luster deployments) making data locality > for M/R jobs standard > 3) HDFS lives in the Java world [which could be interpreted as a drawback I > suppose] > > Dave > > > -----Original Message----- > From: Jeff Hammerbacher [mailto:ham...@cloudera.com] > Sent: Tuesday, May 11, 2010 3:29 PM > To: hbase-user@hadoop.apache.org > Subject: Re: Using HBase on other file systems > > Hey Edward, > > I do think that if you compare GoogleFS to HDFS, GFS looks more full > > featured. > > > > What features are you missing? Multi-writer append was explicitly called > out > by Sean Quinlan as a bad idea, and rolled back. From internal conversations > with Google engineers, erasure coding of blocks suffered a similar fate. > Native client access would certainly be nice, but FUSE gets you most of the > way there. Scalability/availability of the NN, RPC QoS, alternative block > placement strategies are second-order features which didn't exist in GFS > until later in its lifecycle of development as well. HDFS is following a > similar path and has JIRA tickets with active discussions. I'd love to hear > your feature requests, and I'll be sure to translate them into JIRA > tickets. > > I do believe my logic is reasonable. HBase has a lot of code designed > around > > HDFS. We know these tickets that get cited all the time, for better > random > > reads, or for sync() support. HBase gets the benefits of HDFS and has to > > deal with its drawbacks. Other key value stores handle storage directly. > > > > Sync() works and will be in the next release, and its absence was simply a > result of the youth of the system. Now that that limitation has been > removed, please point to another place in the code where using HDFS rather > than the local file system is forcing HBase to make compromises. Your > initial attempts on this front (caching, HFile, compactions) were, I hope, > debunked by my previous email. It's also worth noting that Cassandra does > all three, despite managing its own storage. > > I'm trying to learn from this exchange and always enjoy understanding new > systems. Here's what I have so far from your arguments: > 1) HBase inherits both the advantages and disadvantages of HDFS. I clearly > agree on the general point; I'm pressing you to name some specific > disadvantages, in hopes of helping prioritize our development of HDFS. So > far, you've named things which are either a) not actually disadvantages b) > no longer true. If you can come up with the disadvantages, we'll certainly > take them into account. I've certainly got a number of them on our roadmap. > 2) If you don't want to use HDFS, you won't want to use HBase. Also > certainly true, but I'm not sure there's not much to learn from this > assertion. I'd once again ask: why would you not want to use HDFS, and what > is your choice in its stead? > > Thanks, > Jeff >