[ 
https://issues.apache.org/jira/browse/HBASE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-7763:
---------------------------------

    Attachment: HBASE-7763-trunk-TESTING.patch

Yeah I removed it because I've pretty much found out that it's not enough.  I 
thought I had a simple solution(that I was going to post instead) but this is 
more complex than I had previously thought.

The current selection algorithm will select the largest files.  So sorting by 
size doesn't really cut down on the ammount of compactions.  This was done so 
that compactions will have a supposedly better chance of not picking up the 
same files over and over again.

So I tried changing that.  It was better in some cases and worse in others.

Also I think there are some issues with the way files are choosen with regards 
to the ratio.  All files smaller than or older than (depending upon sorting) 
the first file that does not break the ratio are chosen.  However you can think 
up a set of files where the first file that doesn't break the ratio is the 
wrong border to pick.

For example:

9999, 107, 50, 10, 10, 10, 10

50 > 40 * 1.2 so I would think that shouldn't be included in any compaction.  
However it will be.  Because

107 < 90 * 1.2

So I'm working on getting a better understanding of what these small changes 
will do.

Attached is the patch that I'm using to test.  I still need to add more tests 
and more policies to test the ratio issue.
                
> Compactions not sorting based on size anymore.
> ----------------------------------------------
>
>                 Key: HBASE-7763
>                 URL: https://issues.apache.org/jira/browse/HBASE-7763
>             Project: HBase
>          Issue Type: Bug
>          Components: Compaction
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Elliott Clark
>            Assignee: Elliott Clark
>            Priority: Critical
>             Fix For: 0.96.0, 0.94.6
>
>         Attachments: HBASE-7763-trunk-TESTING.patch
>
>
> Currently compaction selection is not sorting based on size.  This causes 
> selection to choose larger files to re-write than are needed when bulk loads 
> are involved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to