[
https://issues.apache.org/jira/browse/HBASE-26229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407977#comment-17407977
]
Xiaolin Ha edited comment on HBASE-26229 at 9/6/21, 10:44 AM:
--------------------------------------------------------------
Thanks, [~zhangduo], got it now.
I think the idea is practical, but a little complex. There are two relevant
problems of the idea need to consider.
One is that the stipe info is currently in cache, no persistence, and stores in
same region may have different stripes. So at the step of split files to bulk
load, each file should be splitted into stripes, how can get the stripe
boundaries?
Another is that if bulkload files into stripe according to the current stripe
info, same problem like large L0(too many files in L0 need to compact) may
happen in one stripe(too many files in one stripe need to compact, and the
stripe size is very large before it can split by compacting), but the split of
a stripe also needs to compact all the files in it according to current codes.
How can split one stripe without compacting all files in it?
was (Author: xiaolin ha):
Thanks, [~zhangduo], got it now.
I think the idea is practical, but a little complex. There are two relevant
problems of the idea need to consider. One is that the stipe info is currently
in cache, no persistence, and stores in same region may have different stripes.
Another is that if bulkload files into stripe according to the current stripe
info, same problem like large L0 may happen in one stripe, but the split of a
stripe also needs to compact all the files in it according to current
codes(this problem can be resolved by some improve methods).
> Limit count and size of L0 files compaction in StripeCompactionPolicy
> ---------------------------------------------------------------------
>
> Key: HBASE-26229
> URL: https://issues.apache.org/jira/browse/HBASE-26229
> Project: HBase
> Issue Type: Improvement
> Components: Compaction
> Reporter: Xiaolin Ha
> Assignee: Xiaolin Ha
> Priority: Major
> Attachments: after.png, before.png
>
>
> When selecting L0 files in the stripe store file manager to compact, they all
> will be selected. This is the key problem. No file count control and no
> compaction size control for L0 files compactions now. If the compaction size
> is large, e.g. some TBs, then the L0 compaction will need a lot of time to
> complete.
> Since L0 files not only contains the recently flushed files, bulk loaded
> files will also be put into L0. And what's more, when opening a daughter
> region, if the parent stripes can not be rebuild in the daughter, all the
> files will be put to L0.
> So when there are large enough files in L0, there will exists a quite long
> compaction for all the L0 files. If the compaction speed less than the file
> flush speed to L0, larger compactions afterwards. This is a big problem
> especially in bulkloading files.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)