[
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610706#comment-15610706
]
Anoop Sam John commented on HBASE-16417:
----------------------------------------
We need in memory merge for
- Keep the tail of the pipeline with a bigger sized Segment. This is important
to avoid small sized HFiles being created at flush. We are flushing only tail
of the pipeline in any case now.
- To help concurrent reads. When there is only active segment and it is a Map,
one seek/read of a particular cell is just a map.get() op. When there are one
more segment in pipeline (This is CellArrayMap), we will need binary search (Ya
it is not linear search as in case of HFile blocks) to reach to the cell. When
there are so many segments in pipeline, we will need more binary search and so
compromise on the latency of the read op. Doing in btw merges of segments in
pipeline reduce its number and so helps latency.
Ya #2 seems valid and merge is the only way. But for #1 merge is not a
mandatory one.
So IMO flush only tail of whole of the segments (pipeline + active) is not
directly related to merge. Even if we are having merge, there is no issue in
flushing whole of the segments. Now we have merge with every in memory flush
means we are trying keeping only one segment in pipeline. But the policy as
discussed here, is going to change that. Doing merge every time with in memory
flush is very costly. All agree to that. This means we will have 3+ segments
in pipeline. There is still issue of we being flushing smaller sized files. So
IMHO, we must flush whole to disk.
In case of index merge, where we know there are no cell duplicates and so we
avoid data compaction, there is no point at all to delay the flush of the other
segments in pipeline. In case of data compaction ya it make sense.
On the data compaction use case of Y, I have some Qs. Is it increment way? Or
they are put ops but many duplicated cells comes in?
> In-Memory MemStore Policy for Flattening and Compactions
> --------------------------------------------------------
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
> Issue Type: Sub-task
> Reporter: Anastasia Braginsky
> Assignee: Anastasia Braginsky
> Fix For: 2.0.0
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)