> Oh BTW, you can't mmap anything in HBase unless you copy it to local
> disk first.  HDFS => no mmap.

Right.  I know that!  Once the block index is pluggable, the FST would
be an in heap byte[].

On Sat, Jun 4, 2011 at 3:49 PM, Ryan Rawson <[email protected]> wrote:
> Oh BTW, you can't mmap anything in HBase unless you copy it to local
> disk first.  HDFS => no mmap.
>
> just thought you'd like to know.
>
> On Sat, Jun 4, 2011 at 3:41 PM, Jason Rutherglen
> <[email protected]> wrote:
>>> It can be hard to know you have all the corner cases down and you
>>> won't find out in 6 months that every single piece of data you have
>>> put in HBase is corrupt.  Keeping it simple is one strategy.
>>
>> Isn't the block index separate from the actual data?  So corruption in
>> that case is unlikely.
>>
>>> I have previously thought about prefix compression, it seemed doable,
>>> you'd need a compressing algorithm, then in the Scanner you would
>>> expand KeyValues
>>
>> I think we can try that later.  I'm not sure one can make a hard and
>> fast rule to always load the keys into RAM as an FST.  The block index
>> would seem to be fairly separate.
>>
>> On Sat, Jun 4, 2011 at 3:35 PM, Ryan Rawson <[email protected]> wrote:
>>> Also, dont break it :-)
>>>
>>> Part of the goal of HFile was to build something quick and reliable.
>>> It can be hard to know you have all the corner cases down and you
>>> won't find out in 6 months that every single piece of data you have
>>> put in HBase is corrupt.  Keeping it simple is one strategy.
>>>
>>> I have previously thought about prefix compression, it seemed doable,
>>> you'd need a compressing algorithm, then in the Scanner you would
>>> expand KeyValues and callers would end up with copies, not views on,
>>> the original data.  The JVM is fairly good about short lived objects
>>> (up to a certain allocation rate that is), and while the original goal
>>> was to reduce memory usage, it could make sense to take a higher short
>>> term allocation rate if the wins from prefix compression are there.
>>>
>>> Also note that in whole-system profiling, often repeated methods in
>>> KeyValue do pop up.  The goal of KeyValue was to have a format that
>>> didnt require deserialization into larger data structures (hence the
>>> lack of vint), and would be simple and fast.  Undoing that work should
>>> be accompanied with profiling evidence that new slowdowns were not
>>> introduced.
>>>
>>> -ryan
>>>
>>> On Sat, Jun 4, 2011 at 3:30 PM, Jason Rutherglen
>>> <[email protected]> wrote:
>>>>> You'd have to change how the Scanner code works, etc.  You'll find out.
>>>>
>>>> Nice!  Sounds fun.
>>>>
>>>> On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson <[email protected]> wrote:
>>>>> What are the specs/goals of a pluggable block index?  Right now the
>>>>> block index is fairly tied deep in how HFile works. You'd have to
>>>>> change how the Scanner code works, etc.  You'll find out.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Jun 4, 2011 at 3:17 PM, Stack <[email protected]> wrote:
>>>>>> I do not know of one.  FYI hfile is pretty standalone regards tests etc. 
>>>>>>  There is even a perf testing class for hfile
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jun 4, 2011, at 14:44, Jason Rutherglen <[email protected]> 
>>>>>> wrote:
>>>>>>
>>>>>>> I want to take a wh/hack at creating a pluggable block index, is there
>>>>>>> an open issue for this?  I looked and couldn't find one.
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to