> When running on top of Mapr, hbase has fast cached access to locally stored
> files, the Mapr client ensures that. Likewise, hdfs should also ensure that
> local reads are fast and come out of cache as necessary. Eg: the kernel
> block cache.

Agreed!  However I don't see how that's possible today.  Eg, it'd
require more of a byte buffer type of API to HDFS, random reads not
using streams.  It's easy to add.

I think the biggest win for HBase with MapR is the lack of the
NameNode issues and snapshotting.  In particular, snapshots are pretty
much a standard RDBMS feature.

> Managing the block cache in not heap might work but you also might get there 
> and find the dbb accounting
> overhead kills.

Lucene uses/abuses ref counting so I'm familiar with the downsides.
When it works, it's great, when it doesn't it's a nightmare to debug.
It is possible to make it work though.  I don't think there would be
overhead from it, ie, any pool of objects implements ref counting.

It'd be nice to not have a block cache however it's necessary for
caching compressed [on disk] blocks.

On Fri, Jul 8, 2011 at 7:05 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> Hey,
>
> When running on top of Mapr, hbase has fast cached access to locally stored
> files, the Mapr client ensures that. Likewise, hdfs should also ensure that
> local reads are fast and come out of cache as necessary. Eg: the kernel
> block cache.
>
> I wouldn't support mmap, it would require 2 different read path
> implementations. You will never know when a read is not local.
>
> Hdfs needs to provide faster local reads imo. Managing the block cache in
> not heap might work but you also might get there and find the dbb accounting
> overhead kills.
> On Jul 8, 2011 6:47 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com>
> wrote:
>> There are couple of things here, one is direct byte buffers to put the
>> blocks outside of heap, the other is MMap'ing the blocks directly from
>> the underlying HDFS file.
>>
>> I think they both make sense. And I'm not sure MapR's solution will
>> be that much better if the latter is implemented in HBase.
>>
>> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>>> The overhead in a byte buffer is the extra integers to keep track of the
>>> mark, position, limit.
>>>
>>> I am not sure that putting the block cache in to heap is the way to go.
>>> Getting faster local dfs reads is important, and if you run hbase on top
> of
>>> Mapr, these things are taken care of for you.
>>> On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com>
>>> wrote:
>>>> Also, it's for a good cause, moving the blocks out of main heap using
>>>> direct byte buffers or some other more native-like facility (if DBB's
>>>> don't work).
>>>>
>>>> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>>>>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
>>>>> is...annoying.
>>>>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com>
>>>>> wrote:
>>>>>> Is there an open issue for this? How hard will this be? :)
>>>>>
>>>
>

Reply via email to