[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610706#comment-15610706
 ] 

Anoop Sam John commented on HBASE-16417:
----------------------------------------

We need in memory merge for
- Keep the tail of the pipeline with a bigger sized Segment. This is important 
to avoid small sized HFiles being created at flush. We are flushing only tail 
of the pipeline in any case now.
- To help concurrent reads. When there is only active segment and it is a Map, 
one seek/read of a particular cell is just a map.get() op. When there are one 
more segment in pipeline (This is CellArrayMap), we will need binary search (Ya 
it is not linear search as in case of HFile blocks) to reach to the cell.  When 
there are so many segments in pipeline, we will need more binary search and so 
compromise on the latency of the read op. Doing in btw merges of segments in 
pipeline reduce its number and so helps latency.

Ya #2 seems valid and merge is the only way. But for #1 merge is not a 
mandatory one.

So IMO flush only tail of whole of the segments (pipeline + active) is not 
directly related to merge. Even if we are having merge, there is no issue in 
flushing whole of the segments.  Now we have merge with every in memory flush 
means we are trying keeping only one segment in pipeline.  But the policy as 
discussed here, is going to change that. Doing merge every time with in memory 
flush is very costly. All agree to that.  This means we will have 3+ segments 
in pipeline. There is still issue of we being flushing smaller sized files.  So 
IMHO, we must flush whole to disk. 
In case of index merge, where we know there are no cell duplicates and so we 
avoid data compaction, there is no point at all to delay the flush of the other 
segments in pipeline.  In case of data compaction ya it make sense.

On the data compaction use case of Y, I have some Qs.  Is it increment way?  Or 
they are put ops but many duplicated cells comes in?

> In-Memory MemStore Policy for Flattening and Compactions
> --------------------------------------------------------
>
>                 Key: HBASE-16417
>                 URL: https://issues.apache.org/jira/browse/HBASE-16417
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anastasia Braginsky
>            Assignee: Anastasia Braginsky
>             Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to