[jira] [Updated] (CASSANDRA-11447) Flush writer deadlock in Cassandra 2.2.5

Mark Manley (JIRA) Mon, 28 Mar 2016 12:24:40 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-11447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mark Manley updated CASSANDRA-11447:
------------------------------------
    Description: 
When writing heavily to one of my Cassandra tables, I got a deadlock similar to 
CASSANDRA-9882:

{code}
"MemtableFlushWriter:4589" #34721 daemon prio=5 os_prio=0 
tid=0x0000000005fc11d0 nid=0x7664 waiting for monitor entry [0x00007fb83f0e5000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:266)
        - waiting to lock <0x0000000400956258> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
        at 
org.apache.cassandra.db.lifecycle.Tracker.notifyAdded(Tracker.java:400)
        at 
org.apache.cassandra.db.lifecycle.Tracker.replaceFlushed(Tracker.java:332)
        at 
org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:235)
        at 
org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1580)
        at 
org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:362)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
        at 
org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1139)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}

The compaction strategies in this keyspace are mixed with one table using LCS 
and the rest using DTCS.  None of the tables here save for the LCS one seem to 
have large SSTable counts:

{code}
                Table: active_counters
                SSTable count: 2
--

                Table: aggregation_job_entries
                SSTable count: 2
--

                Table: dsp_metrics_log
                SSTable count: 207
--

                Table: dsp_metrics_ts_5min
                SSTable count: 3
--

                Table: dsp_metrics_ts_day
                SSTable count: 2
--

                Table: dsp_metrics_ts_hour
                SSTable count: 2
{code}

Yet the symptoms are similar. 

The "dsp_metrics_ts_5min" table had had a major compaction shortly before all 
this to get rid of the 400+ SStable files before this system went into use, but 
they should have been eliminated.

Have other people seen this?  I am attaching a strack trace.

Thanks!

  was:
When writing heavily to one of my Cassandra tables, I got a deadlock similar to 
CASSANDRA-9882:

{code}
"MemtableFlushWriter:4589" #34721 daemon prio=5 os_prio=0 
tid=0x0000000005fc11d0 nid=0x7664 waiting for monitor entry [0x00007fb83f0e5000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:266)
        - waiting to lock <0x0000000400956258> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
        at 
org.apache.cassandra.db.lifecycle.Tracker.notifyAdded(Tracker.java:400)
        at 
org.apache.cassandra.db.lifecycle.Tracker.replaceFlushed(Tracker.java:332)
        at 
org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:235)
        at 
org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1580)
        at 
org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:362)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
        at 
org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1139)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}

The compaction strategies in this keyspace are mixed with one table using LCS 
and the rest using DTCS.  None of the tables here save for the LCS one seem to 
have large SSTable counts:

{code}
                Table: active_counters
                SSTable count: 2
--

                Table: aggregation_job_entries
                SSTable count: 2
--

                Table: dsp_metrics_log
                SSTable count: 207
--

                Table: dsp_metrics_ts_5min
                SSTable count: 3
--

                Table: dsp_metrics_ts_day
                SSTable count: 2
--

                Table: dsp_metrics_ts_hour
                SSTable count: 2
{code}

Yet the symptoms are similar. 

The "dsp_metrics_ts_5min" table had had a major compaction shortly before all 
this to get rid of the 400+ SStable files before this system went into use, but 
they should have been eliminated.

Have other people seen?  I am attaching a strack trace.

Thanks!


> Flush writer deadlock in Cassandra 2.2.5
> ----------------------------------------
>
>                 Key: CASSANDRA-11447
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11447
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Mark Manley
>         Attachments: cassandra.jstack.out
>
>
> When writing heavily to one of my Cassandra tables, I got a deadlock similar 
> to CASSANDRA-9882:
> {code}
> "MemtableFlushWriter:4589" #34721 daemon prio=5 os_prio=0 
> tid=0x0000000005fc11d0 nid=0x7664 waiting for monitor entry 
> [0x00007fb83f0e5000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:266)
>         - waiting to lock <0x0000000400956258> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>         at 
> org.apache.cassandra.db.lifecycle.Tracker.notifyAdded(Tracker.java:400)
>         at 
> org.apache.cassandra.db.lifecycle.Tracker.replaceFlushed(Tracker.java:332)
>         at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:235)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1580)
>         at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:362)
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1139)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> The compaction strategies in this keyspace are mixed with one table using LCS 
> and the rest using DTCS.  None of the tables here save for the LCS one seem 
> to have large SSTable counts:
> {code}
>               Table: active_counters
>               SSTable count: 2
> --
>               Table: aggregation_job_entries
>               SSTable count: 2
> --
>               Table: dsp_metrics_log
>               SSTable count: 207
> --
>               Table: dsp_metrics_ts_5min
>               SSTable count: 3
> --
>               Table: dsp_metrics_ts_day
>               SSTable count: 2
> --
>               Table: dsp_metrics_ts_hour
>               SSTable count: 2
> {code}
> Yet the symptoms are similar. 
> The "dsp_metrics_ts_5min" table had had a major compaction shortly before all 
> this to get rid of the 400+ SStable files before this system went into use, 
> but they should have been eliminated.
> Have other people seen this?  I am attaching a strack trace.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11447) Flush writer deadlock in Cassandra 2.2.5

Reply via email to