> Oh BTW, you can't mmap anything in HBase unless you copy it to local > disk first. HDFS => no mmap.
Right. I know that! Once the block index is pluggable, the FST would be an in heap byte[]. On Sat, Jun 4, 2011 at 3:49 PM, Ryan Rawson <[email protected]> wrote: > Oh BTW, you can't mmap anything in HBase unless you copy it to local > disk first. HDFS => no mmap. > > just thought you'd like to know. > > On Sat, Jun 4, 2011 at 3:41 PM, Jason Rutherglen > <[email protected]> wrote: >>> It can be hard to know you have all the corner cases down and you >>> won't find out in 6 months that every single piece of data you have >>> put in HBase is corrupt. Keeping it simple is one strategy. >> >> Isn't the block index separate from the actual data? So corruption in >> that case is unlikely. >> >>> I have previously thought about prefix compression, it seemed doable, >>> you'd need a compressing algorithm, then in the Scanner you would >>> expand KeyValues >> >> I think we can try that later. I'm not sure one can make a hard and >> fast rule to always load the keys into RAM as an FST. The block index >> would seem to be fairly separate. >> >> On Sat, Jun 4, 2011 at 3:35 PM, Ryan Rawson <[email protected]> wrote: >>> Also, dont break it :-) >>> >>> Part of the goal of HFile was to build something quick and reliable. >>> It can be hard to know you have all the corner cases down and you >>> won't find out in 6 months that every single piece of data you have >>> put in HBase is corrupt. Keeping it simple is one strategy. >>> >>> I have previously thought about prefix compression, it seemed doable, >>> you'd need a compressing algorithm, then in the Scanner you would >>> expand KeyValues and callers would end up with copies, not views on, >>> the original data. The JVM is fairly good about short lived objects >>> (up to a certain allocation rate that is), and while the original goal >>> was to reduce memory usage, it could make sense to take a higher short >>> term allocation rate if the wins from prefix compression are there. >>> >>> Also note that in whole-system profiling, often repeated methods in >>> KeyValue do pop up. The goal of KeyValue was to have a format that >>> didnt require deserialization into larger data structures (hence the >>> lack of vint), and would be simple and fast. Undoing that work should >>> be accompanied with profiling evidence that new slowdowns were not >>> introduced. >>> >>> -ryan >>> >>> On Sat, Jun 4, 2011 at 3:30 PM, Jason Rutherglen >>> <[email protected]> wrote: >>>>> You'd have to change how the Scanner code works, etc. You'll find out. >>>> >>>> Nice! Sounds fun. >>>> >>>> On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson <[email protected]> wrote: >>>>> What are the specs/goals of a pluggable block index? Right now the >>>>> block index is fairly tied deep in how HFile works. You'd have to >>>>> change how the Scanner code works, etc. You'll find out. >>>>> >>>>> >>>>> >>>>> On Sat, Jun 4, 2011 at 3:17 PM, Stack <[email protected]> wrote: >>>>>> I do not know of one. FYI hfile is pretty standalone regards tests etc. >>>>>> There is even a perf testing class for hfile >>>>>> >>>>>> >>>>>> >>>>>> On Jun 4, 2011, at 14:44, Jason Rutherglen <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I want to take a wh/hack at creating a pluggable block index, is there >>>>>>> an open issue for this? I looked and couldn't find one. >>>>>> >>>>> >>>> >>> >> >
