>> Are you hitting cache at all?
>
> Its totally random, due to the proposed key design which favored fast 
> inserts. Keys are randomized
> values, that is why there is no data locality in row look ups. Effect of the 
> cache (LruBlockCache?) is negligible
> in this case.
>

>>So a different schema would get cache into the mix?

You can/t change schema while system is in production


>>Its going to keep growing without bound?


No, we keep data for XX days than purge stale data from the table.


My question was: what else besides obvious -run all in parallel - can help to 
improve random I/O? 

1. Will BLOOM filter help to optimize HBase Read path?
2. We use compression already.
3. Block size - does it really matter much?
4. Off heap block cache? Its in 92 trunk? Have anybody performed real 
performance tests on Off heap cache?

We could easily allocate 10-15 GB per node thus effectively caching hot data in 
other tables (not in the fact table)

Off heap cache. What is max size of off heap cache we could try?
 My major concerns are: 

- memory allocators are pretty hard to debug and get them working right.
- memory fragmentation? 
- It still relies on on- heap Java data structures to perform eviction- which 
can degrade performance in case of a large caches.

Reply via email to