The tool creates a Map of servers to CompactionRequests needing to be performed. You always select the server with the largest queue (*which is not currently compacting) *to compact next.
I created a JIRA: HBASE-19528 for this tool. On Fri, Dec 15, 2017 at 2:35 PM, Ted Yu <[email protected]> wrote: > bq. with at most N distinct RegionServers compacting at a given time > > If per table balancing is not on, the regions for the underlying table may > not be evenly distributed across the cluster. > In that case, how would the tool which servers to perform compaction ? > > I think you can log a JIRA for upstreaming this tool. > > Thanks > > On Fri, Dec 15, 2017 at 2:01 PM, rahul gidwani <[email protected]> wrote: > > > Hi, > > > > I was wondering if anyone was interested in a manual major compactor > tool. > > > > The basic overview of how this tool works is: > > > > Parameters: > > > > - > > > > Table > > - > > > > Stores > > - > > > > ClusterConcurrency > > - > > > > Timestamp > > > > > > So you input a table, desired concurrency and the list of stores you wish > > to major compact. The tool first checks the filesystem to see which > stores > > need compaction based on the timestamp you provide (default is current > > time). It takes that list of stores that require compaction and executes > > those requests concurrently with at most N distinct RegionServers > > compacting at a given time. Each thread waits for the compaction to > > complete before moving to the next queue. If a region split, merge or > move > > happens this tool ensures those regions get major compacted as well. > > > > We have started using this tool in production but were wondering if there > > is any interest from you guys in getting this upstream. > > > > This helps us in two ways, we can limit how much I/O bandwidth we are > using > > for major compaction cluster wide and we are guaranteed after the tool > > completes that all requested compactions complete regardless of moves, > > merges and splits. > > >
