huaxiang sun commented on HBASE-17172:

Thanks Jingcheng. Regarding with "If we skip the compacted files, the threshold 
is not that useful anymore.", today if there is only one file in the partition, 
and there is no _del files, the file is skipped. With del file, the current 
logic is to compact the already-compacted file with _del file. Let's say there 
is one mob file regionA20161101****, which was compacted. On 12/1/2016, there 
is  _del file regionB20161201****_del, mob compaction kicks in, 
regionA20161101**** is less than the threshold, and it is picked for 
compaction. Since there is a _del file, regionA20161101**** and 
regionB20161201****_del are compacted into regionA20161101****_1 . After that, 
regionB20161201****_del cannot be deleted since it is not a allFile compaction. 
The next mob compaction, regionA20161101****_1 and regionB20161201****_del  
will be picked up again and be compacted into regionA20161101****_2. So in this 
case, it will cause more unnecessary IOs. Could you double confirm if this is 
the case?

> Optimize major mob compaction with _del files
> ---------------------------------------------
>                 Key: HBASE-17172
>                 URL: https://issues.apache.org/jira/browse/HBASE-17172
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.

This message was sent by Atlassian JIRA

Reply via email to