[
https://issues.apache.org/jira/browse/HBASE-19528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318876#comment-16318876
]
churro morales commented on HBASE-19528:
----------------------------------------
[~carp84], I looked at the github project over very quickly. While yes the
overall goal is similar, there are quite a few things it doesn't handle.
Region moves, merges, splits, no compaction guarantees...once compaction has
completed no verification that the filesystem state is actually correct. While
I think its useful, I don't think it provides the guarantees necessary to
include as a first-class hbase feature.
> Major Compaction Tool
> ----------------------
>
> Key: HBASE-19528
> URL: https://issues.apache.org/jira/browse/HBASE-19528
> Project: HBase
> Issue Type: New Feature
> Reporter: churro morales
> Assignee: churro morales
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-19528.patch, HBASE-19528.v1.patch
>
>
> The basic overview of how this tool works is:
> Parameters:
> Table
> Stores
> ClusterConcurrency
> Timestamp
> So you input a table, desired concurrency and the list of stores you wish to
> major compact. The tool first checks the filesystem to see which stores need
> compaction based on the timestamp you provide (default is current time). It
> takes that list of stores that require compaction and executes those requests
> concurrently with at most N distinct RegionServers compacting at a given
> time. Each thread waits for the compaction to complete before moving to the
> next queue. If a region split, merge or move happens this tool ensures those
> regions get major compacted as well.
> This helps us in two ways, we can limit how much I/O bandwidth we are using
> for major compaction cluster wide and we are guaranteed after the tool
> completes that all requested compactions complete regardless of moves, merges
> and splits.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)