[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

Anastasia Braginsky (JIRA) Mon, 28 Nov 2016 02:57:49 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15701629#comment-15701629
 ]


Anastasia Braginsky commented on HBASE-17081:
---------------------------------------------

Thank you for your insights [~ram_krish]!

bq. What I found was that with only flushing the tail anything more than 6 

Do you mean with merges? Merging every 6 segments in pipeline and flushing tail 
(?) 
It is reasonable that you got "too many store files" then. It should not happen 
with composite snapshot. 
In average, every 4 in-memory-flushes there need to be flush-to-disk. Thus if 
THRESHOLD_PIPELINE_SEGMENTS is higher than 5, the merges should be rare, unless 
the entire system is in stress.

bq. The one thing that could be a problem is that when we have scans then we 
need to scan 10 segments

This JIRA is intended to provide a *mechanism of composite snapshot* without 
*optimizing the THRESHOLD_PIPELINE_SEGMENTS*. Under HBASE-16417, Eshcar is 
running experiments with infinite THRESHOLD_PIPELINE_SEGMENTS. We want to set 
THRESHOLD_PIPELINE_SEGMENTS to be infinite here if it doesn't cause any 
performance degradation. Then under HBASE-16417 we should come with really 
optimal policy, which is going to play with all the parameters.

bq. What prompted you to ensure that flushing the entire pipeline is better 
than flushing only the tail as you were doing earlier? I think our concern was 
more on flusing tail only will create lot of small files mainly. Do you observe 
anyother thing when flushing only tail?

Initially, with flattening only, we had too many open files, as you saw it 
yourself. When we introduced merge, you had reported some GC problems due to 
too many small indexes floating around. Additionally without composite snapshot 
the CompositeMemStore is never cleared upon single flush-to-disk, unless its 
active segment is empty since the previous flush-to-disk. Pay attention that 
without composite snapshot, upon flush-to-disk request you are pushing active 
to the pipeline and flushing the pipeline's tail only. So active is not 
flushed, unless it is empty. Thus in order to flush the entire 
CompositeMemStore to disk you need multiple flushes resulting in multiple files 
on disk, which is not desirable. So indeed the idea of truly emptying the store 
upon flush-to-disk looks good to us.

> Flush the entire CompactingMemStore content to disk
> ---------------------------------------------------
>
>                 Key: HBASE-17081
>                 URL: https://issues.apache.org/jira/browse/HBASE-17081
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anastasia Braginsky
>            Assignee: Anastasia Braginsky
>         Attachments: HBASE-17081-V01.patch, HBASE-17081-V02.patch, 
> HBASE-17081-V03.patch, Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

Reply via email to