Re: Using HBase on other file systems

Kevin Apte Tue, 11 May 2010 21:12:52 -0700

I think Gluster also supports large amounts of data- but as I understand it
- Gluster nodes are meant to be "Bricks" that is they are only meant for
Storage.


In Map-Reduce use  - people talk about Map/Reduce jobs running near the
storage- What does it mean?

       -   They run on the same node that has the disks- so they are able to
retrieve data fast.
       -   They run closer to the node that has the data- this can reduce
network traffic

I think with declining cost of server and network switch ports, this should
become less of an issue.  I personally like a "Gluster" like architecture-
storage bricks, striping files across multiple nodes and automatic self
healing- I am assuming these features exist in all of the file systems- but
Gluster seems to be low cost and professionally supported, as is Cloudera.

Kevin




On Wed, May 12, 2010 at 5:10 AM, Buttler, David <buttl...@llnl.gov> wrote:

> If you are opening up the discussion to HDFS, I would really like to think
> more deeply as to why HDFS is a better choice for some workloads than, say,
> Luster or GPFS.
> The things I like about HDFS over Luster is that
> 1) it is easier to set up
> 2) HDFS by default has local storage (as opposed to storage attached
> networks which is more typical for Luster deployments) making data locality
> for M/R jobs standard
> 3) HDFS lives in the Java world [which could be interpreted as a drawback I
> suppose]
>
> Dave
>
>
> -----Original Message-----
> From: Jeff Hammerbacher [mailto:ham...@cloudera.com]
> Sent: Tuesday, May 11, 2010 3:29 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Using HBase on other file systems
>
> Hey Edward,
>
> I do think that if you compare GoogleFS to HDFS, GFS looks more full
> > featured.
> >
>
> What features are you missing? Multi-writer append was explicitly called
> out
> by Sean Quinlan as a bad idea, and rolled back. From internal conversations
> with Google engineers, erasure coding of blocks suffered a similar fate.
> Native client access would certainly be nice, but FUSE gets you most of the
> way there. Scalability/availability of the NN, RPC QoS, alternative block
> placement strategies are second-order features which didn't exist in GFS
> until later in its lifecycle of development as well. HDFS is following a
> similar path and has JIRA tickets with active discussions. I'd love to hear
> your feature requests, and I'll be sure to translate them into JIRA
> tickets.
>
> I do believe my logic is reasonable. HBase has a lot of code designed
> around
> > HDFS.  We know these tickets that get cited all the time, for better
> random
> > reads, or for sync() support. HBase gets the benefits of HDFS and has to
> > deal with its drawbacks. Other key value stores handle storage directly.
> >
>
> Sync() works and will be in the next release, and its absence was simply a
> result of the youth of the system. Now that that limitation has been
> removed, please point to another place in the code where using HDFS rather
> than the local file system is forcing HBase to make compromises. Your
> initial attempts on this front (caching, HFile, compactions) were, I hope,
> debunked by my previous email. It's also worth noting that Cassandra does
> all three, despite managing its own storage.
>
> I'm trying to learn from this exchange and always enjoy understanding new
> systems. Here's what I have so far from your arguments:
> 1) HBase inherits both the advantages and disadvantages of HDFS. I clearly
> agree on the general point; I'm pressing you to name some specific
> disadvantages, in hopes of helping prioritize our development of HDFS. So
> far, you've named things which are either a) not actually disadvantages b)
> no longer true. If you can come up with the disadvantages, we'll certainly
> take them into account. I've certainly got a number of them on our roadmap.
> 2) If you don't want to use HDFS, you won't want to use HBase. Also
> certainly true, but I'm not sure there's not much to learn from this
> assertion. I'd once again ask: why would you not want to use HDFS, and what
> is your choice in its stead?
>
> Thanks,
> Jeff
>

Re: Using HBase on other file systems

Reply via email to