[
https://issues.apache.org/jira/browse/HBASE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238989#comment-14238989
]
Lars Hofhansl commented on HBASE-12657:
---------------------------------------
This code is hard to follow. Could use a few more comments.
In 0.98 I see in RatioBasedCompactionPolicy.removeExcessFiles that it removes
later (i.e. newer files) files when it is over the hbase.hstore.compaction.max
the. So it seems that (older) reference files remain in the selection and hence
should be compacted first...?
What's interesting is RatioBasedCompactionPolicy.getCurrentEligibleFiles, which
does this:
{code}
// exclude all files older than the newest file we're currently
// compacting. this allows us to preserve contiguity (HBASE-2856)
{code}
So if we have any minor compactions on the new files, we wouldn't compact the
(older) reference files.
I don't think it's necessary to always compact all reference files away in one
go. What *is* important that reference are compacted before we compact any
newer files that arrived since the split, that way they will be collected after
a few rounds of compaction and that will not be delayed by new data coming in.
From visual inspection of the code I find it very hard to verify that that is
actually the case.
It's OK to attach preliminary patch; so that we can have a look, [~vrodionov].
You can call it WIP, UNTESTED, or EXAMPLE_ONLY (or whatever) to indicate that
it's not a finished patch.
> The Region is not being split and far exceeds the desired maximum size.
> -----------------------------------------------------------------------
>
> Key: HBASE-12657
> URL: https://issues.apache.org/jira/browse/HBASE-12657
> Project: HBase
> Issue Type: Bug
> Components: Compaction
> Affects Versions: 0.98.8, 0.94.25, 0.99.2
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
> Fix For: 1.0.0, 2.0.0, 0.94.26, 0.98.9
>
>
> We are seeing this behavior when creating indexes in one of our environment.
> When an index is being created, most of the "requests" go into a single
> region. The amount of time to create an index seems to take longer than
> usual and it can take days for the regions to compact and split after the
> index is created.
> Here is a du of the HBase index table:
> {code}
> -bash-4.1$ sudo -su hdfs hadoop fs -du /hbase/43681
> 705 /hbase/43681/.tableinfo.0000000001
> 0 /hbase/43681/.tmp
> 27981697293 /hbase/43681/0492e22092e21d35fca8e779b21ec797
> 539687093 /hbase/43681/832298c4e975fc47210feb6bac3d2f71
> 560660531 /hbase/43681/be9bdb3bdf9365afe5fe90db4247d82c
> 7081938297 /hbase/43681/cd440e524f96fbe0719b2fe969848560
> 6297860287 /hbase/43681/dc893a2d8daa08c689dc69e6bb2c5b50
> 7189607722 /hbase/43681/ffbceaea5e2f142dbe6cd4cbeacc00e8
> ...
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)