[
https://issues.apache.org/jira/browse/HBASE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201362#comment-13201362
]
Nicolas Spiegelberg commented on HBASE-5330:
--------------------------------------------
I spent a little time on this yesterday. This is correct behavior as written.
Some detail:
*#1*
{code}
// Change
compactEquals(store.compactSelection(sfCreate(7,6,5,4,3,2,1)).getFilesToCompact(),
7,6,5,4,3);
// TO:
compactEquals(sfCreate(7, 6, 5, 4, 3, 2, 1), 7, 6, 5, 4, 3);
{code}
The original code is doing a compaction, taking the output files, then doing a
second compaction on them. Obviously, this is an identity operation, but is
not technically correct since we're "double compacting".
*#2*
{code}
store.forceMajor = true;
compactEquals(sfCreate(7, 6, 5, 4, 3, 2, 1), 7, 6, 5, 4, 3);
{code}
Should return [3:7] because it's NOT actually doing a major compaction.
Currently, the algorithm states that Majors with too many files are downgraded.
This is not really the behavoir we want. Instead, for a major compaction, we
should try to compact storefiles[0:N] where N >= min(minFiles,
sizeof(storefiles)). This will be a little tricky, because the candidate files
don't always contain storefile[0], which is necessary for compaction.
*#3*
{code}
// Reference compaction
compactEquals(sfCreate(true, 7, 6, 5, 4, 3, 2, 1), 5, 4, 3, 2, 1);
{code}
This is correct as written, but still needs some improvement. As I recall, the
original reasoning was that we'd only hit this case when we had a bug where we
kept flushing storefiles. We weren't sure how to handle it at the time (we had
prod pressure). The problem is that we didn't have the state of previous
compactions & we thought we'd have to get the whole candidate set. The idea
was that, if we're going to recompact the same files multiple times, it should
be the smaller files at the end rather than the last file. Since we only need
a shard of the files for major compaction and reference files keep inherent
state, we can improve this.
> TestCompactSelection - adding 2 test cases to testCompactionRatio
> -----------------------------------------------------------------
>
> Key: HBASE-5330
> URL: https://issues.apache.org/jira/browse/HBASE-5330
> Project: HBase
> Issue Type: Improvement
> Reporter: Doug Meil
> Assignee: Doug Meil
> Priority: Minor
> Attachments: TestCompactSelection_hbase_5330.java.patch
>
>
> There were three existing assertions in TestCompactSelection
> testCompactionRatio that did "max # of files" assertions...
> {code}
> assertEquals(maxFiles,
>
> store.compactSelection(sfCreate(7,6,5,4,3,2,1)).getFilesToCompact().size());
> {code}
> ... and for references ...
> {code}
> assertEquals(maxFiles,
> store.compactSelection(sfCreate(true,
> 7,6,5,4,3,2,1)).getFilesToCompact().size());
> {code}
>
> ... but they didn't assert against which StoreFiles got selected. While the
> number of StoreFiles is the same, the files selected are actually different,
> and I thought that there should be explicit assertions showing that.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira