[
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15701629#comment-15701629
]
Anastasia Braginsky commented on HBASE-17081:
---------------------------------------------
Thank you for your insights [~ram_krish]!
bq. What I found was that with only flushing the tail anything more than 6
Do you mean with merges? Merging every 6 segments in pipeline and flushing tail
(?)
It is reasonable that you got "too many store files" then. It should not happen
with composite snapshot.
In average, every 4 in-memory-flushes there need to be flush-to-disk. Thus if
THRESHOLD_PIPELINE_SEGMENTS is higher than 5, the merges should be rare, unless
the entire system is in stress.
bq. The one thing that could be a problem is that when we have scans then we
need to scan 10 segments
This JIRA is intended to provide a *mechanism of composite snapshot* without
*optimizing the THRESHOLD_PIPELINE_SEGMENTS*. Under HBASE-16417, Eshcar is
running experiments with infinite THRESHOLD_PIPELINE_SEGMENTS. We want to set
THRESHOLD_PIPELINE_SEGMENTS to be infinite here if it doesn't cause any
performance degradation. Then under HBASE-16417 we should come with really
optimal policy, which is going to play with all the parameters.
bq. What prompted you to ensure that flushing the entire pipeline is better
than flushing only the tail as you were doing earlier? I think our concern was
more on flusing tail only will create lot of small files mainly. Do you observe
anyother thing when flushing only tail?
Initially, with flattening only, we had too many open files, as you saw it
yourself. When we introduced merge, you had reported some GC problems due to
too many small indexes floating around. Additionally without composite snapshot
the CompositeMemStore is never cleared upon single flush-to-disk, unless its
active segment is empty since the previous flush-to-disk. Pay attention that
without composite snapshot, upon flush-to-disk request you are pushing active
to the pipeline and flushing the pipeline's tail only. So active is not
flushed, unless it is empty. Thus in order to flush the entire
CompositeMemStore to disk you need multiple flushes resulting in multiple files
on disk, which is not desirable. So indeed the idea of truly emptying the store
upon flush-to-disk looks good to us.
> Flush the entire CompactingMemStore content to disk
> ---------------------------------------------------
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
> Issue Type: Sub-task
> Reporter: Anastasia Braginsky
> Assignee: Anastasia Braginsky
> Attachments: HBASE-17081-V01.patch, HBASE-17081-V02.patch,
> HBASE-17081-V03.patch, Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another
> part is divided between immutable segments in the compacting pipeline. Upon
> flush-to-disk request we want to flush all of it to disk, in contrast to
> flushing only tail of the compacting pipeline.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)