[ 
https://issues.apache.org/jira/browse/CASSANDRA-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-5087.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.2.1
                   1.1.9
         Reviewer: jbellis

Added comments and committed.  (Note: to 1.1.9 and 1.2.1, but not 1.2.0.)
                
> Changing from higher to lower compaction throughput causes long (multi hour) 
> pause in large compactions
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5087
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5087
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: J.B. Langston
>            Assignee: J.B. Langston
>            Priority: Minor
>             Fix For: 1.1.9, 1.2.1
>
>
> We're running a major compaction against a column family that is 2.1TB (yes, 
> I know it's crazy huge, that's an entirely different discussion). During the 
> evenings, we run a setcompactionthroughput 0 to unthrottle completely, and 
> throttle again down to 20mb at the end of the maintenance window. 
> Every morning we've come in to check progress, we find that the progress 
> completely halts as soon as the compaction throttling command is issued. 
> Eventually, compaction continues. I was looking at the throttling code, and I 
> think I see the issue, but would like confirmation:
> throttleDelta (org.apache.cassandra.utils.Throttle.throttleDelta) sets a 
> sleep time based on the amount of data transferred since the last throttle 
> time. Since we've gone from 20 MB to wide open, and back to 20MB, the wait 
> that is calculated is based on an attempt to average the new throttling rate 
> over the last 6.5 hours of running wide open.
> I think this could be fixed by adding a reset of bytesAtLastDelay and 
> timeAtLastDelay to the current values after the check at line 64:
> Current:
>         // if the target changed, log
>         if (newTargetBytesPerMS != targetBytesPerMS) 
>             logger.debug("{} target throughput now {} bytes/ms.", this, 
> newTargetBytesPerMS);
>         targetBytesPerMS = newTargetBytesPerMS;
> New:
>  
>         // if the target changed, log
>         if (newTargetBytesPerMS != targetBytesPerMS) {
>             logger.debug("{} target throughput now {} bytes/ms.", this, 
> newTargetBytesPerMS);
>             if(newTargetBytesPerMS < targetBytesPerMS || targetBytesPerMS < 
> 1) {
>               bytesAtLastDelay += bytesDelta;
>               timeAtLastDelay = System.currentTimeMillis();
>                 targetBytesPerMS = newTargetBytesPerMS;
>               return;
>             }
>             targetBytesPerMS = newTargetBytesPerMS;
>         }
> Some redundancies that can be removed there, but I wanted to keep the 
> approach local to where I thought the problem was. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to