[
https://issues.apache.org/jira/browse/HBASE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572862#comment-13572862
]
Sergey Shelukhin commented on HBASE-7763:
-----------------------------------------
1) The book seems to disagree about seqNum:
"To overwrite an existing value, do a put at exactly the same row, column, and
version as that of the cell you would overshadow."
"If multiple writes to a cell have the same version, are all versions
maintained or just the last? ... Currently, only the last written is fetchable."
Although maybe this is not a big deal to change, maybe someone else can
comment. I was previously assuming this is important when thinking about
compactions.
2) Any particular reason to run two iterations of selection? Can it run until
it stops compacting or gets the number of files to same baseline? Also, -6k/-4k
files is hard to judge about baseline.
3) +1 on taking the smallest files in case of max files limitation.
4) In your 100000000-900-1 example I would argue that 900 and 1 files are
similar, in light of the 100000000 file. This is really a question of whether
you want more I/O, more files on average, but smaller compactions; or less I/O
and less files but large compactions. I am not an expert customer scenarios, I
wonder if L/R be configurable?
Also, Facebook was trying to solve similar problem with tier-based compaction
(HBASE-6371, HBASE-7055) where files would be selected based on their
characteristics; for example size.
> Compactions not sorting based on size anymore.
> ----------------------------------------------
>
> Key: HBASE-7763
> URL: https://issues.apache.org/jira/browse/HBASE-7763
> Project: HBase
> Issue Type: Bug
> Components: Compaction
> Affects Versions: 0.96.0, 0.94.4
> Reporter: Elliott Clark
> Assignee: Elliott Clark
> Priority: Critical
> Fix For: 0.96.0, 0.94.6
>
> Attachments: HBASE-7763-trunk-TESTING.patch,
> HBASE-7763-trunk-TESTING.patch, HBASE-7763-trunk-TESTING.patch
>
>
> Currently compaction selection is not sorting based on size. This causes
> selection to choose larger files to re-write than are needed when bulk loads
> are involved.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira