[ 
https://issues.apache.org/jira/browse/HBASE-19528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317298#comment-16317298
 ] 

Ted Yu commented on HBASE-19528:
--------------------------------

Please add license header to the new files.
Also add audience annotation.
{code}
+  int getCompactionsLeft() {
{code}
'Compactions' can mean the compaction request or the number of compactions. 
Please rename the method to reflect the count.
{code}
+  boolean atCapacity() {
+    lock.readLock().lock();
+    try {
+      return compactingServers.size() >= concurrentServers;
{code}
'atCapacity' seems to imply the case of compactingServers.size() == 
concurrentServers . See if there is better method name.
{code}
+              .getRegionInfo().getEncodedName(), " already compacted");
+        }
+        return familiesToCompact;
{code}
Why returning familiesToCompact after seeing the first StoreFileInfo ?

Please put @VisibleForTesting one line above the method it affects.
{code}
+  private final Connection connection;
{code}
Why including connection in the request ? Cannot the connection be created 
locally ?

> Major Compaction Tool 
> ----------------------
>
>                 Key: HBASE-19528
>                 URL: https://issues.apache.org/jira/browse/HBASE-19528
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: churro morales
>            Assignee: churro morales
>             Fix For: 2.0.0, 3.0.0
>
>         Attachments: HBASE-19528.patch
>
>
> The basic overview of how this tool works is:
> Parameters:
>     Table
>     Stores
>     ClusterConcurrency
>     Timestamp
> So you input a table, desired concurrency and the list of stores you wish to 
> major compact.  The tool first checks the filesystem to see which stores need 
> compaction based on the timestamp you provide (default is current time).  It 
> takes that list of stores that require compaction and executes those requests 
> concurrently with at most N distinct RegionServers compacting at a given 
> time.  Each thread waits for the compaction to complete before moving to the 
> next queue.  If a region split, merge or move happens this tool ensures those 
> regions get major compacted as well. 
> This helps us in two ways, we can limit how much I/O bandwidth we are using 
> for major compaction cluster wide and we are guaranteed after the tool 
> completes that all requested compactions complete regardless of moves, merges 
> and splits. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to