On Jul 8, 2011 7:19 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com>
wrote:
>
> > When running on top of Mapr, hbase has fast cached access to locally
stored
> > files, the Mapr client ensures that. Likewise, hdfs should also ensure
that
> > local reads are fast and come out of cache as necessary. Eg: the kernel
> > block cache.
>
> Agreed!  However I don't see how that's possible today.  Eg, it'd
> require more of a byte buffer type of API to HDFS, random reads not
> using streams.  It's easy to add.

I don't think its as easy as you say. And even using the stream API Mapr
delivers a lot more performance. And this is from my own tests not a white
paper.

>
> I think the biggest win for HBase with MapR is the lack of the
> NameNode issues and snapshotting.  In particular, snapshots are pretty
> much a standard RDBMS feature.

That is good too - if you are using hbase in real time prod you need to look
at Mapr.

But even beyond that the performance improvements are insane. We are talking
like 8-9x perf on my tests. Not to mention substantially reduced latency.

I'll repeat again, local accelerated access is going to be a required
feature. It already is.

I investigated using dbb once upon a time, I concluded that managing the ref
counts would be a nightmare, and the better solution was to copy keyvalues
out of the dbb during scans.

Injecting refcount code seems like a worse remedy than the problem. Hbase
doesn't have as many bugs but explicit ref counting everywhere seems
dangerous. Especially when a perf solution is already here. Use Mapr or
hdfs-347/local reads.
>
> > Managing the block cache in not heap might work but you also might get
there and find the dbb accounting
> > overhead kills.
>
> Lucene uses/abuses ref counting so I'm familiar with the downsides.
> When it works, it's great, when it doesn't it's a nightmare to debug.
> It is possible to make it work though.  I don't think there would be
> overhead from it, ie, any pool of objects implements ref counting.
>
> It'd be nice to not have a block cache however it's necessary for
> caching compressed [on disk] blocks.
>
> On Fri, Jul 8, 2011 at 7:05 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> > Hey,
> >
> > When running on top of Mapr, hbase has fast cached access to locally
stored
> > files, the Mapr client ensures that. Likewise, hdfs should also ensure
that
> > local reads are fast and come out of cache as necessary. Eg: the kernel
> > block cache.
> >
> > I wouldn't support mmap, it would require 2 different read path
> > implementations. You will never know when a read is not local.
> >
> > Hdfs needs to provide faster local reads imo. Managing the block cache
in
> > not heap might work but you also might get there and find the dbb
accounting
> > overhead kills.
> > On Jul 8, 2011 6:47 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com>
> > wrote:
> >> There are couple of things here, one is direct byte buffers to put the
> >> blocks outside of heap, the other is MMap'ing the blocks directly from
> >> the underlying HDFS file.
> >>
> >> I think they both make sense. And I'm not sure MapR's solution will
> >> be that much better if the latter is implemented in HBase.
> >>
> >> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> >>> The overhead in a byte buffer is the extra integers to keep track of
the
> >>> mark, position, limit.
> >>>
> >>> I am not sure that putting the block cache in to heap is the way to
go.
> >>> Getting faster local dfs reads is important, and if you run hbase on
top
> > of
> >>> Mapr, these things are taken care of for you.
> >>> On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com
>
> >>> wrote:
> >>>> Also, it's for a good cause, moving the blocks out of main heap using
> >>>> direct byte buffers or some other more native-like facility (if DBB's
> >>>> don't work).
> >>>>
> >>>> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ryano...@gmail.com>
wrote:
> >>>>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the
API
> >>>>> is...annoying.
> >>>>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <
jason.rutherg...@gmail.com>
> >>>>> wrote:
> >>>>>> Is there an open issue for this? How hard will this be? :)
> >>>>>
> >>>
> >

Reply via email to