On Wed, Oct 26, 2011 at 2:50 PM, Vladimir Rodionov
<[email protected]> wrote:
>>> Are you hitting cache at all?
>>
>> Its totally random, due to the proposed key design which favored fast 
>> inserts. Keys are randomized
>> values, that is why there is no data locality in row look ups. Effect of the 
>> cache (LruBlockCache?) is negligible
>> in this case.
>>
>
>>>So a different schema would get cache into the mix?
>
> You can/t change schema while system is in production
>

True but caveat Ted's note and FB fellas apparently did it three times
before they hit on the 'right' schema (Not sure whether they took the
portion being modified offline when changing schema)

>
>>>Its going to keep growing without bound?
>
>
> No, we keep data for XX days than purge stale data from the table.
>
>
> My question was: what else besides obvious -run all in parallel - can help to 
> improve random I/O?
>
> 1. Will BLOOM filter help to optimize HBase Read path?

Yes.  0.92 blooms will be less expensive than those in 0.90 (because
the blooms are tiered and live in the LRU in 0.92 so they are let go
if unused).


> 2. We use compression already.
> 3. Block size - does it really matter much?

Not much in my experience.  Smaller blocks can help a little at the
cost of some bloat in index size (Again 0.92 is better here because
indices are partitioned and now also are in the LRU rather than pegged
in RAM as they are in 0.90).

> 4. Off heap block cache? Its in 92 trunk? Have anybody performed real 
> performance tests on Off heap cache?
>

Off-heap cache is experimental in 0.92 and TRUNK.

St.Ack

Reply via email to