[
https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023860#comment-15023860
]
stack commented on HBASE-14735:
-------------------------------
[~zhoushuaifeng2] So, IIRC, if too many storefiles, we intentionally prevented
split... While we might split once if lots of files in a Store, the way split
works, if any reference files in a Store, then we'd not be able to split until
the references had been cleaned up (compactions clean up references). So, if
you had a region that is filling with storefiles, while you might be able to
split once, you'd not be able to spit a second time until after all the
references had been cleaned out.. .and to do that, we needed to compact as fast
as we could to remove any and all references; at extreme we would hold up
flushing new storefiles. Thats sort of how it worked/works and explains some of
the comments you are seeing in the code referenced by [~anoop.hbase]. So, now,
after [~anoop.hbase]'s questions, I'm wary of this patch. I don't think it will
really get you what you want.... you might get one split but then you'll run
into a wall because your store will have reference files and can't be split
till after all had been removed; i.e. recursive compacting... to get us back
under blocking file count.
What was going on your cluster, do you know? Were compactions not able to keep
up? Would splitting have made it more likely that they could keep up? 400G and
100+ files is not good either.
> Region may grow too big and can not be split
> --------------------------------------------
>
> Key: HBASE-14735
> URL: https://issues.apache.org/jira/browse/HBASE-14735
> Project: HBase
> Issue Type: Bug
> Components: Compaction, regionserver
> Affects Versions: 1.1.2, 0.98.15
> Reporter: Shuaifeng Zhou
> Assignee: Shuaifeng Zhou
> Attachments: 14735-0.98.patch, 14735-branch-1.1.patch,
> 14735-branch-1.2.patch, 14735-branch-1.2.patch, 14735-master (2).patch,
> 14735-master.patch, 14735-master.patch
>
>
> When a compaction completed, may there are also many storefiles in the store,
> and CompactPriority < 0, then compactSplitThread will do a "Recursive
> enqueue" compaction request instead of request a split:
> {code:title=CompactSplitThread.java|borderStyle=solid}
> if (completed) {
> // degenerate case: blocked regions require recursive enqueues
> if (store.getCompactPriority() <= 0) {
> requestSystemCompaction(region, store, "Recursive enqueue");
> } else {
> // see if the compaction has caused us to exceed max region size
> requestSplit(region);
> }
> {code}
> But in some situation, the "recursive enqueue" request may return null, and
> not build up a new compaction runner. For example, an other compaction of the
> same region is running, and compaction selection will exclude all files older
> than the newest files currently compacting, this may cause no enough files
> can be selected by the "recursive enqueue" request. When this happen, split
> will not be trigged. If the input load is high enough, compactions aways
> running on the region, and split will never be triggered.
> In our cluster, this situation happened, and a huge region more than 400GB
> and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the
> problem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)