[
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707080#comment-15707080
]
huaxiang sun commented on HBASE-17172:
--------------------------------------
Hi [~jingcheng.du] and [~anoop.hbase], just did more code reading and found
that _del files can be included in minor mob compaction when the file size is
less than the threshold. Assume that user sets a high threshold value, even for
already compacted-files, it can be included in the compact list again and be
compacted with the del files. If we want to deal with _del files mainly in
major mob compaction. Can we skip these already-compacted files in the minor
compaction? something like in the select() after files are added to
filesToCompact map. This is to speed up minor compaction with del files.
{code}
diff --git
a/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java
b/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java
index 33aecc0..dab05d2 100644
---
a/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java
+++
b/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java
@@ -25,6 +25,7 @@ import java.util.Collection;
import java.util.Collections;
import java.util.Date;
import java.util.HashMap;
+import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
@@ -179,6 +180,23 @@ public class PartitionedMobCompactor extends MobCompactor {
selectedFileCount++;
}
}
+
+ /*
+ * If it is not a major mob compaction with del files, and the file number
in Partition is 1,
+ * remove the partition from filesToCompact list to avoid re-compacting
files which has been
+ * compacted with del files.
+ */
+ if (!allFiles && (allDelFiles.size() > 0)) {
+ for(Iterator<Map.Entry<CompactionPartitionId, CompactionPartition>> it =
+ filesToCompact.entrySet().iterator(); it.hasNext(); ) {
+ Map.Entry<CompactionPartitionId, CompactionPartition> entry =
it.next();
+ if (entry.getValue().getFileNumbers() <= 1) {
+ it.remove();
+ --selectedFileCount;
+ }
+ }
+ }
+
PartitionedMobCompactionRequest request = new
PartitionedMobCompactionRequest(
filesToCompact.values(), allDelFiles);
if (candidates.size() == (allDelFiles.size() + selectedFileCount +
irrelevantFileCount)) {
{code}
> Optimize major mob compaction with _del files
> ---------------------------------------------
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
> Issue Type: Improvement
> Components: mob
> Affects Versions: 2.0.0
> Reporter: huaxiang sun
> Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every
> mob file will be recompacted, this causes lots of IO and slow down major mob
> compaction (may take months to finish). This needs to be improved. A few
> ideas are:
> 1) Do not compact all _del files into one, instead, compact them based on
> groups with startKey as the key. Then use firstKey/startKey to make each mob
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that
> timerange does not need to include the _del file as these are newer files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)