[jira] [Commented] (HBASE-12657) The Region is not being split and far exceeds the desired maximum size.

Lars Hofhansl (JIRA) Mon, 08 Dec 2014 21:07:38 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238989#comment-14238989
 ]


Lars Hofhansl commented on HBASE-12657:
---------------------------------------

This code is hard to follow. Could use a few more comments.
In 0.98 I see in RatioBasedCompactionPolicy.removeExcessFiles that it removes 
later (i.e. newer files) files when it is over the hbase.hstore.compaction.max 
the. So it seems that (older) reference files remain in the selection and hence 
should be compacted first...?

What's interesting is RatioBasedCompactionPolicy.getCurrentEligibleFiles, which 
does this:
{code}
      // exclude all files older than the newest file we're currently
      // compacting. this allows us to preserve contiguity (HBASE-2856)
{code}

So if we have any minor compactions on the new files, we wouldn't compact the 
(older) reference files.
I don't think it's necessary to always compact all reference files away in one 
go. What *is* important that reference are compacted before we compact any 
newer files that arrived since the split, that way they will be collected after 
a few rounds of compaction and that will not be delayed by new data coming in. 
From visual inspection of the code I find it very hard to verify that that is 
actually the case.

It's OK to attach preliminary patch; so that we can have a look, [~vrodionov]. 
You can call it WIP, UNTESTED,  or EXAMPLE_ONLY (or whatever) to indicate that 
it's not a finished patch.


> The Region is not being split and far exceeds the desired maximum size.
> -----------------------------------------------------------------------
>
>                 Key: HBASE-12657
>                 URL: https://issues.apache.org/jira/browse/HBASE-12657
>             Project: HBase
>          Issue Type: Bug
>          Components: Compaction
>    Affects Versions: 0.98.8, 0.94.25, 0.99.2
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 1.0.0, 2.0.0, 0.94.26, 0.98.9
>
>
> We are seeing this behavior when creating indexes in one of our environment.
> When an index is being created, most of the "requests" go into a single 
> region.  The amount of time to create an index seems to take longer than 
> usual and it can take days for the regions to compact and split after the 
> index is created.
> Here is a du of the HBase index table:
> {code}
> -bash-4.1$ sudo -su hdfs hadoop fs -du /hbase/43681
> 705          /hbase/43681/.tableinfo.0000000001
> 0            /hbase/43681/.tmp
> 27981697293  /hbase/43681/0492e22092e21d35fca8e779b21ec797
> 539687093    /hbase/43681/832298c4e975fc47210feb6bac3d2f71
> 560660531    /hbase/43681/be9bdb3bdf9365afe5fe90db4247d82c
> 7081938297   /hbase/43681/cd440e524f96fbe0719b2fe969848560
> 6297860287   /hbase/43681/dc893a2d8daa08c689dc69e6bb2c5b50
> 7189607722   /hbase/43681/ffbceaea5e2f142dbe6cd4cbeacc00e8
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12657) The Region is not being split and far exceeds the desired maximum size.

Reply via email to