I have a proposal that should improve Hypertable performance in certain
situations.  When running the HBase benchmark, the one test that we didn't
significantly beat HBase on was the random read test.  During the test, the
RangeSevers were using just a little more than 800MB, which was the
configured size of the block cache.  However, HBase was using all of the RAM
that was configured.  I suspect the problem is that when we loaded the data
into Hypertable, the RangeServers aggressively compacted the data to keep
the commit log pruned back to a minimum, whereas HBase had left a
significant amount of data in their cell cache equivalent.  This would give
HBase and unfair advantage in the random read test since more of the dataset
would have been resident in memory.

In general, if the RangeServers have memory available to them, they should
use it if possible.  I propose that after a minor compaction, we keep the
immutable cell cache in memory and have it overshadow the corresponding
CellStore on disk.  When the system determines that it needs more memory in
its regular maintenance task, it can purge these cell caches.

At some point we should probably have a learning algorithm, or at the very
least a heuristic that determines the best use of memory among these shadow
cell caches, the block cache, and the query cache.

- Doug

--

You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.


Reply via email to