[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15694193#comment-15694193
 ] 

huaxiang sun commented on HBASE-17172:
--------------------------------------

{code}
Hmm. If the .del is not a performance killer, we don't need this. I reviewed 
the code, I think the .del files is not the reason of the slow compaction, 
major compaction itself is.
{code}
I need to provide more background here. Let's say mob files have been major 
compacted one week ago. There are regionA and regionB, assume there is 
regionA20161001*** and regionB20161001**** which are the results from previous 
major compaction. There is one del file for regionA created the past week. A 
major compaction kicks in. regionA20161001*** and regionB20161001*** will be 
re-compacted in this case. While compacting regionA20161001**** is needed, 
re-compacting regionB20161001*** is a waste. Given there are lots of other 
regions and many already-compacted files, unnecessary compaction slows down the 
major compaction.

> Optimize major mob compaction with _del files
> ---------------------------------------------
>
>                 Key: HBASE-17172
>                 URL: https://issues.apache.org/jira/browse/HBASE-17172
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to