There are plenty of arguments in both directions for caching above the DB, in 
the DB, or under the DB/in the FS.  I have significant interest in supporting 
large heaps and reducing GC issues within the HBase RegionServer and I am 
already running with local fs reads.  I don't think a faster dfs makes HBase 
caching irrelevant or the conversation a non-starter.

To get back to the original question, I ended up trying this once.  I wrote a 
rough implementation of a slab allocator a few months ago to dive in and see 
what it would take.  The big challenge is KeyValue and its various comparators. 
 The ByteBuffer API can be maddening at times but it can be done.  I ended up 
somewhere slightly more generic, where KeyValue was taking a ByteBlock which 
contained ref counting and a reference to the allocator it came from, in 
addition to a ByteBuffer.

The easy way to rely on DirectByteBuffers and the like would be to make a copy 
on read into a normal byte[], and then no need to worry about ref counting and 
revamping KV.  Of course, at the cost of short-term allocations.  In my 
experience, you can tune the GC around this and the cost really becomes CPU.

I'm in the process of re-implementing some of this stuff on top of the HFile v2 
that is coming soon.  Once that goes in, this gets much easier at the HFile and 
block cache level (a new wrapper around ByteBuffer called HFileBlock which can 
be used for refc and such, instead of introducing huge changes for caching 
stuff)

JG

 
> -----Original Message-----
> From: Ted Dunning [mailto:tdunn...@maprtech.com]
> Sent: Saturday, July 09, 2011 11:14 PM
> To: dev@hbase.apache.org
> Subject: Re: Converting byte[] to ByteBuffer
> 
> No.  The JNI is below the HDFS compatible API.  Thus the changed code is in
> the hadoop.jar and associated jars and .so's that MapR supplies.
> 
> The JNI still runs in the HBase memory image, though, so it can make data
> available faster.
> 
> The cache involved includes the cache of disk blocks (not HBase memcache
> blocks) in the JNI and in the filer sub-system.
> 
> The detailed reasons why more caching in the file system and less in HBase
> makes the overall system faster are not completely worked out, but the
> general outlines are pretty clear.  There are likely several factors at work 
> in
> any case including less GC cost due to smaller memory foot print, caching
> compressed blocks instead of Java structures and simplification due to a
> clean memory hand-off with associated strong demarcation of where
> different memory allocators have jurisdiction.
> 
> On Sat, Jul 9, 2011 at 3:48 PM, Jason Rutherglen
> <jason.rutherg...@gmail.com
> > wrote:
> 
> > I'm a little confused, I was told none of the HBase code changed with
> > MapR, if the HBase (not the OS) block cache has a JNI implementation
> > then that part of the HBase code changed.
> > On Jul 9, 2011 11:19 AM, "Ted Dunning" <tdunn...@maprtech.com> wrote:
> > > MapR does help with the GC because it *does* have a JNI interface
> > > into an external block cache.
> > >
> > > Typical configurations with MapR trim HBase down to the minimal
> > > viable
> > size
> > > and increase the file system cache correspondingly.
> > >
> > > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen <
> > jason.rutherg...@gmail.com
> > >> wrote:
> > >
> > >> MapR doesn't help with the GC issues. If MapR had a JNI interface
> > >> into an external block cache then that'd be a different story. :)
> > >> And I'm sure it's quite doable.
> > >>
> >

Reply via email to