Re: Major Compaction Tool

Ted Yu Fri, 15 Dec 2017 14:36:06 -0800

bq. with at most N distinct RegionServers compacting at a given time

If per table balancing is not on, the regions for the underlying table may
not be evenly distributed across the cluster.
In that case, how would the tool which servers to perform compaction ?


I think you can log a JIRA for upstreaming this tool.

Thanks

On Fri, Dec 15, 2017 at 2:01 PM, rahul gidwani <chu...@apache.org> wrote:

> Hi,
>
> I was wondering if anyone was interested in a manual major compactor tool.
>
> The basic overview of how this tool works is:
>
> Parameters:
>
>    -
>
>    Table
>    -
>
>    Stores
>    -
>
>    ClusterConcurrency
>    -
>
>    Timestamp
>
>
> So you input a table, desired concurrency and the list of stores you wish
> to major compact.  The tool first checks the filesystem to see which stores
> need compaction based on the timestamp you provide (default is current
> time).  It takes that list of stores that require compaction and executes
> those requests concurrently with at most N distinct RegionServers
> compacting at a given time.  Each thread waits for the compaction to
> complete before moving to the next queue.  If a region split, merge or move
> happens this tool ensures those regions get major compacted as well.
>
> We have started using this tool in production but were wondering if there
> is any interest from you guys in getting this upstream.
>
> This helps us in two ways, we can limit how much I/O bandwidth we are using
> for major compaction cluster wide and we are guaranteed after the tool
> completes that all requested compactions complete regardless of moves,
> merges and splits.
>

Re: Major Compaction Tool

Reply via email to