> When running on top of Mapr, hbase has fast cached access to locally stored > files, the Mapr client ensures that. Likewise, hdfs should also ensure that > local reads are fast and come out of cache as necessary. Eg: the kernel > block cache.
Agreed! However I don't see how that's possible today. Eg, it'd require more of a byte buffer type of API to HDFS, random reads not using streams. It's easy to add. I think the biggest win for HBase with MapR is the lack of the NameNode issues and snapshotting. In particular, snapshots are pretty much a standard RDBMS feature. > Managing the block cache in not heap might work but you also might get there > and find the dbb accounting > overhead kills. Lucene uses/abuses ref counting so I'm familiar with the downsides. When it works, it's great, when it doesn't it's a nightmare to debug. It is possible to make it work though. I don't think there would be overhead from it, ie, any pool of objects implements ref counting. It'd be nice to not have a block cache however it's necessary for caching compressed [on disk] blocks. On Fri, Jul 8, 2011 at 7:05 PM, Ryan Rawson <ryano...@gmail.com> wrote: > Hey, > > When running on top of Mapr, hbase has fast cached access to locally stored > files, the Mapr client ensures that. Likewise, hdfs should also ensure that > local reads are fast and come out of cache as necessary. Eg: the kernel > block cache. > > I wouldn't support mmap, it would require 2 different read path > implementations. You will never know when a read is not local. > > Hdfs needs to provide faster local reads imo. Managing the block cache in > not heap might work but you also might get there and find the dbb accounting > overhead kills. > On Jul 8, 2011 6:47 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com> > wrote: >> There are couple of things here, one is direct byte buffers to put the >> blocks outside of heap, the other is MMap'ing the blocks directly from >> the underlying HDFS file. >> >> I think they both make sense. And I'm not sure MapR's solution will >> be that much better if the latter is implemented in HBase. >> >> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ryano...@gmail.com> wrote: >>> The overhead in a byte buffer is the extra integers to keep track of the >>> mark, position, limit. >>> >>> I am not sure that putting the block cache in to heap is the way to go. >>> Getting faster local dfs reads is important, and if you run hbase on top > of >>> Mapr, these things are taken care of for you. >>> On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com> >>> wrote: >>>> Also, it's for a good cause, moving the blocks out of main heap using >>>> direct byte buffers or some other more native-like facility (if DBB's >>>> don't work). >>>> >>>> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ryano...@gmail.com> wrote: >>>>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API >>>>> is...annoying. >>>>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com> >>>>> wrote: >>>>>> Is there an open issue for this? How hard will this be? :) >>>>> >>> >