Re: Using HBase on other file systems

Andrew Purtell Wed, 12 May 2010 10:31:17 -0700

Before recommending Gluster I suggest you set up a test cluster and then 
randomly kill bricks.


Also as pointed out in another mail, you'll want to colocate TaskTrackers on 
Gluster bricks to get I/O locality, yet there is no way for Gluster to export 
stripe locations back to Hadoop. 

It seems a poor choice. 

   - Andy

> From: Edward Capriolo
> Subject: Re: Using HBase on other file systems
> To: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org>
> Date: Wednesday, May 12, 2010, 6:38 AM
> On Tuesday, May 11, 2010, Jeff
> Hammerbacher <ham...@cloudera.com>
> wrote:
> > Hey Edward,
> >
> > I do think that if you compare GoogleFS to HDFS, GFS
> looks more full
> >> featured.
> >>
> >
> > What features are you missing? Multi-writer append was
> explicitly called out
> > by Sean Quinlan as a bad idea, and rolled back. From
> internal conversations
> > with Google engineers, erasure coding of blocks
> suffered a similar fate.
> > Native client access would certainly be nice, but FUSE
> gets you most of the
> > way there. Scalability/availability of the NN, RPC
> QoS, alternative block
> > placement strategies are second-order features which
> didn't exist in GFS
> > until later in its lifecycle of development as well.
> HDFS is following a
> > similar path and has JIRA tickets with active
> discussions. I'd love to hear
> > your feature requests, and I'll be sure to translate
> them into JIRA tickets.
> >
> > I do believe my logic is reasonable. HBase has a lot
> of code designed around
> >> HDFS.  We know these tickets that get cited all
> the time, for better random
> >> reads, or for sync() support. HBase gets the
> benefits of HDFS and has to
> >> deal with its drawbacks. Other key value stores
> handle storage directly.
> >>
> >
> > Sync() works and will be in the next release, and its
> absence was simply a
> > result of the youth of the system. Now that that
> limitation has been
> > removed, please point to another place in the code
> where using HDFS rather
> > than the local file system is forcing HBase to make
> compromises. Your
> > initial attempts on this front (caching, HFile,
> compactions) were, I hope,
> > debunked by my previous email. It's also worth noting
> that Cassandra does
> > all three, despite managing its own storage.
> >
> > I'm trying to learn from this exchange and always
> enjoy understanding new
> > systems. Here's what I have so far from your
> arguments:
> > 1) HBase inherits both the advantages and
> disadvantages of HDFS. I clearly
> > agree on the general point; I'm pressing you to name
> some specific
> > disadvantages, in hopes of helping prioritize our
> development of HDFS. So
> > far, you've named things which are either a) not
> actually disadvantages b)
> > no longer true. If you can come up with the
> disadvantages, we'll certainly
> > take them into account. I've certainly got a number of
> them on our roadmap.
> > 2) If you don't want to use HDFS, you won't want to
> use HBase. Also
> > certainly true, but I'm not sure there's not much to
> learn from this
> > assertion. I'd once again ask: why would you not want
> to use HDFS, and what
> > is your choice in its stead?
> >
> > Thanks,
> > Jeff
> >
> 
> Jeff,
> 
> Let me first mention that you have mentioned some thing as
> fixed, that
> are only fixed in trunk. I consider trunk futureware and I
> do not like
> to have tempral conversations. Even when trunk becomes
> current there
> is no guarentee that the entire problem is solved. After
> all appends
> were fixed in .19 or not , or again?
> 
> I rescanned the gfs white paper to support my argument that
> hdfs is
> stripped down. Found
> Writes at offset ARE supported
> Checkpoints
> Application level checkpoints
> Snapshot
> Shadow read only master
> 
> hdfs chose features it wanted and ignored others that is
> why I called
> it a pure map reduce implementation.
> 
> My main point, is that hbase by nature needs high speed
> random read
> and random write. Hdfs by nature is bad at these things. If
> you can
> not keep a high cache hit rate via large block cache via
> ram hbase is
> going to slam hdfs doing large block reads for small parts
> of files.
> 
> So you ask. Me what I would use instead. I do not think
> there is a
> viable alternative in the 100 tb and up range but I do
> think for
> people in the 20 tb range somethink like gluster that is
> very
> performance focused might deliver amazing results in some
> applications.
>

Re: Using HBase on other file systems

Reply via email to