[ 
https://issues.apache.org/jira/browse/HBASE-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969939#action_12969939
 ] 

Jonathan Gray commented on HBASE-3327:
--------------------------------------

It does help.  For flushes the different between cacheOnWrite and this are not 
that big.  This helps mostly in the face of compactions, I think.

One potential downside of keeping stuff in MemStore vs. block cache via 
CacheOnWrite is the relative efficiencies.  With a full increment workload what 
I see is major reductions in storage between MemStore -> block cache -> 
compressed files.  I'm seeing approximately 128MB -> 32MB -> 2-3MB (so block 
cache is 4X more efficient at storing the same data as MemStore, and compressed 
files another 10X).

There's also the suspicion that I think many of us have that reads out of 
MemStore are actually slower than reads out of the block cache.

I still think this is a really interesting potential direction but w/ 
CacheOnWrite and the difference in space efficiency, I think other 
optimizations may be better to target first.

> For increment workloads, retain memstores in memory after flushing them
> -----------------------------------------------------------------------
>
>                 Key: HBASE-3327
>                 URL: https://issues.apache.org/jira/browse/HBASE-3327
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>
> This is an improvement based on our observation of what happens in an 
> increment workload. The working set is typically small and is contained in 
> the memstores. 
> 1. The reason the memstores get flushed is because the number of wal logs 
> limit gets hit. 
> 2. This in turn triggers compactions, which evicts the block cache. 
> 3. Flushing of memstore and eviction of the block cache causes disk reads for 
> increments coming in after this because the data is no longer in memory.
> We could solve this elegantly by retaining the memstores AFTER they are 
> flushed into files. This would mean we can quickly populate the new memstore 
> with the working set of data from memory itself without having to hit disk. 
> We can throttle the number of such memstores we retain, or the memory 
> allocated to it. In fact, allocating a percentage of the block cache to this 
> would give us a huge boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to