[
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692581#comment-15692581
]
huaxiang sun commented on HBASE-17172:
--------------------------------------
Thanks [[email protected]]! Distributed compaction is definitely helpful
even with the minor compaction, consider that mob compaction needs to acquire a
table lock. The purpose of major compaction is trying to reduce the number of
files. With HBASE-16891, users may still choose to disable mob compaction chore
and run mob compaction manually at scheduled maintenance. To keep delete marker
in hbase files in mob-enabled cf is one way to avoid .del files, the concern is
that it is inconsistent with non-mob cfs (maybe this can be provided as option
through config?). Another way may be to optimize it as the current jira
suggests. For an example, user deletes some rows for one or two regions, after
compaction, there will be .del files created. With the current major mob
compaction, these .del files will be included in compacting of files for other
regions which is not necessary, the net effect is that all mob files will be
re-compacted. More ideas about how to optimize it are welcome, but I think
distributed mob compaction is definitely needed, thanks.
> Optimize major mob compaction with _del files
> ---------------------------------------------
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
> Issue Type: Improvement
> Components: mob
> Affects Versions: 2.0.0
> Reporter: huaxiang sun
> Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every
> mob file will be recompacted, this causes lots of IO and slow down major mob
> compaction (may take months to finish). This needs to be improved. A few
> ideas are:
> 1) Do not compact all _del files into one, instead, compact them based on
> groups with startKey as the key. Then use firstKey/startKey to make each mob
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that
> timerange does not need to include the _del file as these are newer files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)