Hi,

I was wondering if anyone was interested in a manual major compactor tool.

The basic overview of how this tool works is:

Parameters:

   -

   Table
   -

   Stores
   -

   ClusterConcurrency
   -

   Timestamp


So you input a table, desired concurrency and the list of stores you wish
to major compact.  The tool first checks the filesystem to see which stores
need compaction based on the timestamp you provide (default is current
time).  It takes that list of stores that require compaction and executes
those requests concurrently with at most N distinct RegionServers
compacting at a given time.  Each thread waits for the compaction to
complete before moving to the next queue.  If a region split, merge or move
happens this tool ensures those regions get major compacted as well.

We have started using this tool in production but were wondering if there
is any interest from you guys in getting this upstream.

This helps us in two ways, we can limit how much I/O bandwidth we are using
for major compaction cluster wide and we are guaranteed after the tool
completes that all requested compactions complete regardless of moves,
merges and splits.

Reply via email to