[ 
https://issues.apache.org/jira/browse/HBASE-11861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248395#comment-14248395
 ] 

Jonathan Hsieh commented on HBASE-11861:
----------------------------------------

bq. This is why I insist to run the mob compaction in regions. If we do the mob 
compaction out of region or across regions, we have to locks the major 
compactions globally.

nice catch on that race condition -- I buy it.  This is essentially the same as 
with the MR sweeper approach right? 

So we'd need to guarantee that the compacted mob and the bulkload of the new 
references block a major compaction on the region that the ref bulk load is 
happening on.   This means no major compactions before step #2, but allowed 
after step #4.  

Let's spell out the costs of the different approaches. -- the del mob global 
scan for the mob compaction approach and the per region mob compaction. 

Meanwhile I noticed you file a new jira for counts and I filed one for the del 
mob generator.  We can get code started on those, and hash out this higher 
level design while doing so.

bq. I think we could leave the expired(live longer than TTL) cells out of the 
del files. Let the ExpiredMobFileCleaner to handle those mob files directly.

sounds reasonable.  We need to enforce the mob file time ordering though to 
make sure the mob compaction is effective.



> Native MOB Compaction mechanisms.
> ---------------------------------
>
>                 Key: HBASE-11861
>                 URL: https://issues.apache.org/jira/browse/HBASE-11861
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hsieh
>         Attachments: 141030-mob-compaction.pdf, mob compaction.pdf
>
>
> Currently, the first cut of mob will have external processes to age off old 
> mob data (the ttl cleaner), and to compact away deleted or over written data 
> (the sweep tool).  
> From an operational point of view, having two external tools, especially one 
> that relies on MapReduce is undesirable.  In this issue we'll tackle 
> integrating these into hbase without requiring external processes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to