[ 
https://issues.apache.org/jira/browse/HBASE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023860#comment-15023860
 ] 

stack commented on HBASE-14735:
-------------------------------

[~zhoushuaifeng2] So, IIRC, if too many storefiles, we intentionally prevented 
split... While we might split once if lots of files in a Store, the way split 
works, if any reference files in a Store, then we'd not be able to split until 
the references had been cleaned up (compactions clean up references). So, if 
you had a region that is filling with storefiles, while you might be able to 
split once, you'd not be able to spit a second time until after all the 
references had been cleaned out.. .and to do that, we needed to compact as fast 
as we could to remove any and all references; at extreme we would hold up 
flushing new storefiles. Thats sort of how it worked/works and explains some of 
the comments you are seeing in the code referenced by [~anoop.hbase]. So, now, 
after [~anoop.hbase]'s questions, I'm wary of this patch. I don't think it will 
really get you what you want.... you might get one split but then you'll run 
into a wall because your store will have reference files and can't be split 
till after all had been removed; i.e. recursive compacting... to get us back 
under blocking file count.

What was going on your cluster, do you know? Were compactions not able to keep 
up? Would splitting have made it more likely that they could keep up? 400G and 
100+ files is not good either.

> Region may grow too big and can not be split
> --------------------------------------------
>
>                 Key: HBASE-14735
>                 URL: https://issues.apache.org/jira/browse/HBASE-14735
>             Project: HBase
>          Issue Type: Bug
>          Components: Compaction, regionserver
>    Affects Versions: 1.1.2, 0.98.15
>            Reporter: Shuaifeng Zhou
>            Assignee: Shuaifeng Zhou
>         Attachments: 14735-0.98.patch, 14735-branch-1.1.patch, 
> 14735-branch-1.2.patch, 14735-branch-1.2.patch, 14735-master (2).patch, 
> 14735-master.patch, 14735-master.patch
>
>
> When a compaction completed, may there are also many storefiles in the store, 
> and CompactPriority < 0, then compactSplitThread will do a "Recursive 
> enqueue" compaction request instead of request a split:
> {code:title=CompactSplitThread.java|borderStyle=solid}
>         if (completed) {
>           // degenerate case: blocked regions require recursive enqueues
>           if (store.getCompactPriority() <= 0) {
>             requestSystemCompaction(region, store, "Recursive enqueue");
>           } else {
>             // see if the compaction has caused us to exceed max region size
>             requestSplit(region);
>           }
> {code}
> But in some situation, the "recursive enqueue" request may return null, and 
> not build up a new compaction runner. For example, an other compaction of the 
> same region is running, and compaction selection will exclude all files older 
> than the newest files currently compacting, this may cause no enough files 
> can be selected by the "recursive enqueue" request. When this happen, split 
> will not be trigged. If the input load is high enough, compactions aways 
> running on the region, and split will never be triggered.
> In our cluster, this situation happened, and a huge region more than 400GB 
> and 100+ storefiles appeared. Version is 0.98.10, and the trank also have the 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to