[
https://issues.apache.org/jira/browse/HBASE-26229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407977#comment-17407977
]
Xiaolin Ha edited comment on HBASE-26229 at 9/6/21, 10:58 AM:
--------------------------------------------------------------
Thanks, [~zhangduo], got it now.
I think the idea is practical, but a little complex. There are two relevant
problems of the idea need to consider.
One is that the stipe info is currently in cache, no persistence, and stores in
same region may have different stripes. So at the step of split files to bulk
load, each file should be splitted into stripes, how can get the stripe
boundaries?
Another is that how to solve data skew problems in stripes, stripe split needs
compact all files in it in one compaction request.
If bulkload files into stripe according to the current stripe info, same
problem like large L0 in this issue(too many files in L0 need to compact) may
happen in one stripe(too many files in one stripe need to compact, and the
stripe size is very large before it can split by compacting). How can split one
stripe without compacting all files in it in one compaction request? Or how to
make compact as soon as possible before stripe is too large to compact?
was (Author: xiaolin ha):
Thanks, [~zhangduo], got it now.
I think the idea is practical, but a little complex. There are two relevant
problems of the idea need to consider.
One is that the stipe info is currently in cache, no persistence, and stores in
same region may have different stripes. So at the step of split files to bulk
load, each file should be splitted into stripes, how can get the stripe
boundaries?
Another is that if bulkload files into stripe according to the current stripe
info, same problem like large L0(too many files in L0 need to compact) may
happen in one stripe(too many files in one stripe need to compact, and the
stripe size is very large before it can split by compacting), but the split of
a stripe also needs to compact all the files in it according to current codes.
How can split one stripe without compacting all files in it in one compaction?
> Limit count and size of L0 files compaction in StripeCompactionPolicy
> ---------------------------------------------------------------------
>
> Key: HBASE-26229
> URL: https://issues.apache.org/jira/browse/HBASE-26229
> Project: HBase
> Issue Type: Improvement
> Components: Compaction
> Reporter: Xiaolin Ha
> Assignee: Xiaolin Ha
> Priority: Major
> Attachments: after.png, before.png
>
>
> When selecting L0 files in the stripe store file manager to compact, they all
> will be selected. This is the key problem. No file count control and no
> compaction size control for L0 files compactions now. If the compaction size
> is large, e.g. some TBs, then the L0 compaction will need a lot of time to
> complete.
> Since L0 files not only contains the recently flushed files, bulk loaded
> files will also be put into L0. And what's more, when opening a daughter
> region, if the parent stripes can not be rebuild in the daughter, all the
> files will be put to L0.
> So when there are large enough files in L0, there will exists a quite long
> compaction for all the L0 files. If the compaction speed less than the file
> flush speed to L0, larger compactions afterwards. This is a big problem
> especially in bulkloading files.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)