[
https://issues.apache.org/jira/browse/HBASE-15381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231546#comment-15231546
]
Jingcheng Du commented on HBASE-15381:
--------------------------------------
Thanks [~tedyu].
Sweep tool is a tool, and distributed mob compaction is a compaction mechanism
that runs periodically in the cluster.
Sweep tool uses MapReduce, it is distributed to RSs by mapper and reducer, and
this tool scans all the mob table and merges the linked small files. ( cells in
HBase -> mob files).
Distributed mob compaction uses procedure, and distributed to RSs by procedure
too. It directly handles the mob files and merges the small files into bigger
ones, at last adds the new reference cells back to hbase by bulk load. (mob
files -> cells in HBase).
Sweeper is "cells in HBase" -> mob files, and mob compaction is "mob files ->
cells in HBase), two different directions.
> Implement a distributed MOB compaction by procedure
> ---------------------------------------------------
>
> Key: HBASE-15381
> URL: https://issues.apache.org/jira/browse/HBASE-15381
> Project: HBase
> Issue Type: Improvement
> Components: mob
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Attachments: mob distributed compaction design.pdf
>
>
> In MOB, there is a periodical compaction which runs in HMaster (It can be
> disabled by configuration), some small mob files are merged into bigger ones.
> Now the compaction only runs in HMaster which is not efficient and might
> impact the running of HMaster. In this JIRA, a distributed MOB compaction is
> introduced, it is triggered by HMaster, but all the compaction jobs are
> distributed to HRegionServers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)