On Jul 8, 2011 7:19 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com> wrote: > > > When running on top of Mapr, hbase has fast cached access to locally stored > > files, the Mapr client ensures that. Likewise, hdfs should also ensure that > > local reads are fast and come out of cache as necessary. Eg: the kernel > > block cache. > > Agreed! However I don't see how that's possible today. Eg, it'd > require more of a byte buffer type of API to HDFS, random reads not > using streams. It's easy to add.
I don't think its as easy as you say. And even using the stream API Mapr delivers a lot more performance. And this is from my own tests not a white paper. > > I think the biggest win for HBase with MapR is the lack of the > NameNode issues and snapshotting. In particular, snapshots are pretty > much a standard RDBMS feature. That is good too - if you are using hbase in real time prod you need to look at Mapr. But even beyond that the performance improvements are insane. We are talking like 8-9x perf on my tests. Not to mention substantially reduced latency. I'll repeat again, local accelerated access is going to be a required feature. It already is. I investigated using dbb once upon a time, I concluded that managing the ref counts would be a nightmare, and the better solution was to copy keyvalues out of the dbb during scans. Injecting refcount code seems like a worse remedy than the problem. Hbase doesn't have as many bugs but explicit ref counting everywhere seems dangerous. Especially when a perf solution is already here. Use Mapr or hdfs-347/local reads. > > > Managing the block cache in not heap might work but you also might get there and find the dbb accounting > > overhead kills. > > Lucene uses/abuses ref counting so I'm familiar with the downsides. > When it works, it's great, when it doesn't it's a nightmare to debug. > It is possible to make it work though. I don't think there would be > overhead from it, ie, any pool of objects implements ref counting. > > It'd be nice to not have a block cache however it's necessary for > caching compressed [on disk] blocks. > > On Fri, Jul 8, 2011 at 7:05 PM, Ryan Rawson <ryano...@gmail.com> wrote: > > Hey, > > > > When running on top of Mapr, hbase has fast cached access to locally stored > > files, the Mapr client ensures that. Likewise, hdfs should also ensure that > > local reads are fast and come out of cache as necessary. Eg: the kernel > > block cache. > > > > I wouldn't support mmap, it would require 2 different read path > > implementations. You will never know when a read is not local. > > > > Hdfs needs to provide faster local reads imo. Managing the block cache in > > not heap might work but you also might get there and find the dbb accounting > > overhead kills. > > On Jul 8, 2011 6:47 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com> > > wrote: > >> There are couple of things here, one is direct byte buffers to put the > >> blocks outside of heap, the other is MMap'ing the blocks directly from > >> the underlying HDFS file. > >> > >> I think they both make sense. And I'm not sure MapR's solution will > >> be that much better if the latter is implemented in HBase. > >> > >> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ryano...@gmail.com> wrote: > >>> The overhead in a byte buffer is the extra integers to keep track of the > >>> mark, position, limit. > >>> > >>> I am not sure that putting the block cache in to heap is the way to go. > >>> Getting faster local dfs reads is important, and if you run hbase on top > > of > >>> Mapr, these things are taken care of for you. > >>> On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <jason.rutherg...@gmail.com > > >>> wrote: > >>>> Also, it's for a good cause, moving the blocks out of main heap using > >>>> direct byte buffers or some other more native-like facility (if DBB's > >>>> don't work). > >>>> > >>>> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ryano...@gmail.com> wrote: > >>>>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API > >>>>> is...annoying. > >>>>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" < jason.rutherg...@gmail.com> > >>>>> wrote: > >>>>>> Is there an open issue for this? How hard will this be? :) > >>>>> > >>> > >