[
https://issues.apache.org/jira/browse/HBASE-15381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179185#comment-15179185
]
Jingcheng Du commented on HBASE-15381:
--------------------------------------
bq. What do you see when this is going on? A master that lags burdened down by
all the i/o?
There will be heavy I/O between the node where HM resides and data nodes of
HDFS. It might impact the network latency between HM and RS. And like what
Anoop said, the locality will be lost after the compaction. I try to address
such issues in new implementation.
bq.
bq. How you see it working? What happens when compactions get backed up?
In the distributed compaction, the compaction is periodically triggered by HM,
and the job is distributed to all RS by procedure, each RS will find the files
belong to it and distribute them to online regions.
The mob compaction in each region compact small files in batches.
# Merge small files into a bigger one. (hopefully this big file won't be merged
again from then on).
# bulkload the hfile which contains the meta cells (reference cells) to HBase.
Then the new data are visible to users. Any exception occurs during each batch
will trigger a rollback of compaction.
> Implement a distributed MOB compaction by procedure
> ---------------------------------------------------
>
> Key: HBASE-15381
> URL: https://issues.apache.org/jira/browse/HBASE-15381
> Project: HBase
> Issue Type: Improvement
> Components: mob
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
>
> In MOB, there is a periodical compaction which runs in HMaster (It can be
> disabled by configuration), some small mob files are merged into bigger ones.
> Now the compaction only runs in HMaster which is not efficient and might
> impact the running of HMaster. In this JIRA, a distributed MOB compaction is
> introduced, it is triggered by HMaster, but all the compaction jobs are
> distributed to HRegionServers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)