[
https://issues.apache.org/jira/browse/HBASE-19528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295511#comment-16295511
]
Dave Latham commented on HBASE-19528:
-------------------------------------
Adding a couple notes - we wanted to avoid filling up the large compactions
queue with these requests, to allow other large compactions to proceed. We
also wanted to limit the total number in process at once on the cluster to
limit the cluster wide IO load (and storage impact on a close to full cluster
with few large regions). And, as Rahul noted, wanted to be able to be sure
that the job got done completely, even in the presence of regions moving,
splitting, or merging.
> Major Compaction Tool
> ----------------------
>
> Key: HBASE-19528
> URL: https://issues.apache.org/jira/browse/HBASE-19528
> Project: HBase
> Issue Type: New Feature
> Reporter: churro morales
> Assignee: churro morales
> Fix For: 2.0.0, 3.0.0
>
>
> The basic overview of how this tool works is:
> Parameters:
> Table
> Stores
> ClusterConcurrency
> Timestamp
> So you input a table, desired concurrency and the list of stores you wish to
> major compact. The tool first checks the filesystem to see which stores need
> compaction based on the timestamp you provide (default is current time). It
> takes that list of stores that require compaction and executes those requests
> concurrently with at most N distinct RegionServers compacting at a given
> time. Each thread waits for the compaction to complete before moving to the
> next queue. If a region split, merge or move happens this tool ensures those
> regions get major compacted as well.
> This helps us in two ways, we can limit how much I/O bandwidth we are using
> for major compaction cluster wide and we are guaranteed after the tool
> completes that all requested compactions complete regardless of moves, merges
> and splits.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)