[
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692088#comment-15692088
]
Jingcheng Du commented on HBASE-17172:
--------------------------------------
Thanks [~huaxiang]!
A major compaction compacts all the files even without del files which is slow.
Is it related with the del files? How about to increase the number of threads
to perform the compaction to reduce the running time?
Actually if the delete is rare, we can always keep the delete marker in hbase
files in mob-enabled cf even in all files and major compaction. And we won't
need the .del files in mob anymore.
If this slow is not related with the .del files, I guess we have to fix the
slow compaction by implementing a distributed compaction. I filed a JIRA
HBASE-15381 to implement this, the patch is there, but I didn't rebase for long
time. Are you interested to take it?
> Optimize major mob compaction with _del files
> ---------------------------------------------
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
> Issue Type: Improvement
> Components: mob
> Affects Versions: 2.0.0
> Reporter: huaxiang sun
> Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every
> mob file will be recompacted, this causes lots of IO and slow down major mob
> compaction (may take months to finish). This needs to be improved. A few
> ideas are:
> 1) Do not compact all _del files into one, instead, compact them based on
> groups with startKey as the key. Then use firstKey/startKey to make each mob
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that
> timerange does not need to include the _del file as these are newer files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)