[jira] [Updated] (CASSANDRA-3592) Major Compaction Incredibly Slow

Dan Hendry (Updated) (JIRA) Thu, 08 Dec 2011 08:42:08 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-3592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dan Hendry updated CASSANDRA-3592:
----------------------------------

    Description: 
Twice now (on different nodes), I have observed major compaction for certain 
column families take *significantly* longer on 1.0.3 in comparison to 0.8.6. 
For example,

On the 0.8.6 node, the post compaction log message:
{noformat}CompactionManager.java (line 608) Compacted to XXX. 339,164,959,170 
to 158,825,469,883 (~46% of original) bytes for 25,996 keys.  Time: 
26,934,317ms.{noformat}

On the 1.0.3 node, the post compaction log message:
{noformat} CompactionTask.java (line 213) Compacted to [XXX].  222,338,354,529 
to 147,751,403,084 (~66% of original) bytes for 26,100 keys at 0.562045MB/s. 
Time: 250,703,563ms.{noformat}

So... literally an order of magnitude slower on 1.0.3 in comparison to 0.8.6.

Relevant configuration settings:
* compaction_throughput_mb_per_sec: 0 (why? because the compaction throttling 
logic as currently implemented is highly unsuitable for wide rows but thats a 
different issue)
* in_memory_compaction_limit_in_mb: 128

Column family characteristics:
* Many wide rows (~5% of rows greater than > 10MB and hundreds of rows greater 
than 100 MB, with many small columns).
* Heavy use of expiring columns - each row represents data for a particular 
hour so typically all columns in the row will expire together.
* The significant size shrinkage as reported by the log messages is due mainly 
to expired data being cleaned up (I typically trigger major compaction when 
30-50% of the on disk data has expired which is about once every 3 weeks per 
node).
* Perhaps obviously: size tiered compaction and no compression (the schema has 
not changed since the partial upgrade to 1.0.x)
* Standard column family

Performance notes during compaction:
* Nice CPU usage and load average is basically the same between 0.8.6 and 1.0.3 
- ie, compaction IS running and is not getting stalled or hung up anywhere. 
* Compaction is IO bound on the 0.8.6 machines - the disks see heavy, constant 
utilization when compaction is running.
* Compaction is uses virtually no IO on the 1.0.3 machines - disk utilization 
is virtually no different when compacting vs not compacting (but at the same 
time, CPU usage and load average clearly indicate that compaction IS running).

Finally, I have not had time to profile more thoroughly but jconsole always 
shows the following stacktrace for the active compaction thread (for the 1.0.3 
machine):

{noformat}
Stack trace: 
 
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:851)
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:835)
org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:826)
org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(PrecompactedRow.java:77)
org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:102)
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:133)
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:102)
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:87)
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:116)
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:172)
org.apache.cassandra.db.compaction.CompactionManager$4.call(CompactionManager.java:277)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)
{noformat}

I have not really had time to debug the issue further but given the stack trace 
and column family characteristics, I suspect some problem with the column 
family column iterator .remove() method. If the underlying ISortedColumns 
implementation is a ArrayBackedSortedColumns instance, then repeated .remove() 
calls could be incredibly inefficient.

  was:
Twice now (on different nodes), I have observed major compaction for certain 
column families take *significantly* longer on 1.0.3 in comparison to 0.8.6. 
For example,

On the 0.8.6 node, the post compaction log message:
{noformat}CompactionManager.java (line 608) Compacted to XXX. 339,164,959,170 
to 158,825,469,883 (~46% of original) bytes for 25,996 keys.  Time: 
26,934,317ms.{noformat}

On the 1.0.3 node, the post compaction log message:
{noformat} CompactionTask.java (line 213) Compacted to [XXX].  222,338,354,529 
to 147,751,403,084 (~66% of original) bytes for 26,100 keys at 0.562045MB/s. 
Time: 250,703,563ms.{noformat}

So... literally an order of magnitude slower on 1.0.3 in comparison to 0.8.6.

Relevant configuration settings:
* compaction_throughput_mb_per_sec: 0 (why? because the compaction throttling 
logic as currently implemented is highly unsuitable for wide rows but thats a 
different issue)
* in_memory_compaction_limit_in_mb: 128

Column family characteristics:
* Many wide rows (~5% of rows greater than > 10MB and hundreds of rows greater 
than 100 MB, with many small columns).
* Heavy use of expiring columns - each row represents data for a particular 
hour so typically all columns in the row will expire together.
* The significant size shrinkage as reported by the log messages is due mainly 
to expired data being cleaned up (I typically trigger major compaction when 
30-50% of the on disk data has expired which is about once every 3 weeks per 
node).
* Perhaps obviously: size tiered compaction and no compression (the schema has 
not changed since the partial upgrade to 1.0.x)
* Standard column family

Performance notes during compaction:
* Nice CPU usage and load average is basically the same between 0.8.6 and 1.0.3 
- ie, compaction IS running and is not getting stalled or hung up anywhere. 
* Compaction is IO bound on the 0.8.6 machines - the disks see heavy, constant 
utilization when compaction is running.
* Compaction is uses virtually no CPU on the 1.0.3 machines - disk utilization 
is virtually no different when compacting vs not compacting (but at the same 
time, CPU usage and load average clearly indicate that compaction IS running).

Finally, I have not had time to profile more thoroughly but jconsole always 
shows the following stacktrace for the active compaction thread (for the 1.0.3 
machine):

{noformat}
Stack trace: 
 
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:851)
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:835)
org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:826)
org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(PrecompactedRow.java:77)
org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:102)
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:133)
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:102)
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:87)
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:116)
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:172)
org.apache.cassandra.db.compaction.CompactionManager$4.call(CompactionManager.java:277)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)
{noformat}

I have not really had time to debug the issue further but given the stack trace 
and column family characteristics, I suspect some problem with the column 
family column iterator .remove() method. If the underlying ISortedColumns 
implementation is a ArrayBackedSortedColumns instance, then repeated .remove() 
calls could be incredibly inefficient.

    
> Major Compaction Incredibly Slow
> --------------------------------
>
>                 Key: CASSANDRA-3592
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3592
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>         Environment: RHEL6 - 24 core machines 24 GB mem total, 11 GB java heap
> java version "1.6.0_26"
> 6 node cluster ([email protected], [email protected])
>            Reporter: Dan Hendry
>              Labels: compaction
>
> Twice now (on different nodes), I have observed major compaction for certain 
> column families take *significantly* longer on 1.0.3 in comparison to 0.8.6. 
> For example,
> On the 0.8.6 node, the post compaction log message:
> {noformat}CompactionManager.java (line 608) Compacted to XXX. 339,164,959,170 
> to 158,825,469,883 (~46% of original) bytes for 25,996 keys.  Time: 
> 26,934,317ms.{noformat}
> On the 1.0.3 node, the post compaction log message:
> {noformat} CompactionTask.java (line 213) Compacted to [XXX].  
> 222,338,354,529 to 147,751,403,084 (~66% of original) bytes for 26,100 keys 
> at 0.562045MB/s. Time: 250,703,563ms.{noformat}
> So... literally an order of magnitude slower on 1.0.3 in comparison to 0.8.6.
> Relevant configuration settings:
> * compaction_throughput_mb_per_sec: 0 (why? because the compaction throttling 
> logic as currently implemented is highly unsuitable for wide rows but thats a 
> different issue)
> * in_memory_compaction_limit_in_mb: 128
> Column family characteristics:
> * Many wide rows (~5% of rows greater than > 10MB and hundreds of rows 
> greater than 100 MB, with many small columns).
> * Heavy use of expiring columns - each row represents data for a particular 
> hour so typically all columns in the row will expire together.
> * The significant size shrinkage as reported by the log messages is due 
> mainly to expired data being cleaned up (I typically trigger major compaction 
> when 30-50% of the on disk data has expired which is about once every 3 weeks 
> per node).
> * Perhaps obviously: size tiered compaction and no compression (the schema 
> has not changed since the partial upgrade to 1.0.x)
> * Standard column family
> Performance notes during compaction:
> * Nice CPU usage and load average is basically the same between 0.8.6 and 
> 1.0.3 - ie, compaction IS running and is not getting stalled or hung up 
> anywhere. 
> * Compaction is IO bound on the 0.8.6 machines - the disks see heavy, 
> constant utilization when compaction is running.
> * Compaction is uses virtually no IO on the 1.0.3 machines - disk utilization 
> is virtually no different when compacting vs not compacting (but at the same 
> time, CPU usage and load average clearly indicate that compaction IS running).
> Finally, I have not had time to profile more thoroughly but jconsole always 
> shows the following stacktrace for the active compaction thread (for the 
> 1.0.3 machine):
> {noformat}
> Stack trace: 
>  
> org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:851)
> org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:835)
> org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:826)
> org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(PrecompactedRow.java:77)
> org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:102)
> org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:133)
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:102)
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:87)
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:116)
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:172)
> org.apache.cassandra.db.compaction.CompactionManager$4.call(CompactionManager.java:277)
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> java.util.concurrent.FutureTask.run(FutureTask.java:138)
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> java.lang.Thread.run(Thread.java:662)
> {noformat}
> I have not really had time to debug the issue further but given the stack 
> trace and column family characteristics, I suspect some problem with the 
> column family column iterator .remove() method. If the underlying 
> ISortedColumns implementation is a ArrayBackedSortedColumns instance, then 
> repeated .remove() calls could be incredibly inefficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3592) Major Compaction Incredibly Slow

Reply via email to