Re: cassandra 3.11.4 deadlock

2020-06-23 Thread John Doe
https://issues.apache.org/jira/browse/CASSANDRA-15898


> Sent: Tuesday, June 23, 2020 at 4:16 PM
> From: "John Doe" 
> To: user@cassandra.apache.org
> Subject: Re: cassandra 3.11.4 deadlock
>
> 
> Yes
> here is one more example, happended on another node
>  
> using TWCS
> compaction = {'class': 
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
>   'compaction_window_size': '30',
>   'compaction_window_unit': 'MINUTES',
>   'max_threshold': '32',
>   'min_threshold': '4'}
> 
> ##
> "CompactionExecutor:244228":
>   waiting to lock Monitor@0x7bb425df7ee8 (Object@0x7fd71f91e2d8, a 
> org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy),
>   which is held by "CompactionExecutor:244235"
> "CompactionExecutor:244235":
>   waiting to lock Monitor@0x7bb3d83ea188 (Object@0x7fd487f937b0, a 
> java/io/ExpiringCache),
>   which is held by "CompactionExecutor:244228"
> 
> 
> I didnt know which thread is it, so I got all CompactionExecutor and 
> SizeTieredCompactionStrategy threads.
> 
> 
> ### CompactionExecutor Threads
> 
> Thread 111461: (state = BLOCKED)
>  - java.io.ExpiringCache.get(java.lang.String) @bci=0, line=75 (Compiled 
> frame)
>  - java.io.UnixFileSystem.canonicalize(java.lang.String) @bci=17, line=152 
> (Compiled frame)
>  - java.io.File.getCanonicalPath() @bci=27, line=618 (Compiled frame)
>  - java.io.FilePermission$1.run() @bci=103, line=215 (Compiled frame)
>  - java.io.FilePermission$1.run() @bci=1, line=203 (Compiled frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedAction) 
> @bci=0 (Compiled frame)
>  - java.io.FilePermission.init(int) @bci=97, line=203 (Compiled frame)
>  - java.io.FilePermission.(java.lang.String, java.lang.String) @bci=10, 
> line=277 (Compiled frame)
>  - java.lang.SecurityManager.checkRead(java.lang.String) @bci=8, line=888 
> (Compiled frame)
>  - java.io.File.length() @bci=13, line=969 (Compiled frame)
>  - org.apache.cassandra.io.sstable.format.big.BigTableWriter.openEarly() 
> @bci=87, line=284 (Compiled frame)
>  - 
> org.apache.cassandra.io.sstable.SSTableRewriter.maybeReopenEarly(org.apache.cassandra.db.DecoratedKey)
>  @bci=111, line=179 (Compiled frame)
>  - 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(org.apache.cassandra.db.rows.UnfilteredRowIterator)
>  @bci=9, line=134 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(org.apache.cassandra.db.rows.UnfilteredRowIterator)
>  @bci=5, line=65 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(org.apache.cassandra.db.rows.UnfilteredRowIterator)
>  @bci=12, line=142 (Compiled frame)
>  - org.apache.cassandra.db.compaction.CompactionTask.runMayThrow() @bci=490, 
> line=201 (Compiled frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 (Compiled 
> frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector)
>  @bci=6, line=85 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector)
>  @bci=2, line=61 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run()
>  @bci=128, line=266 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Compiled frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Compiled frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$6.run() @bci=4 
> (Compiled frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> Thread 111416: (state = BLOCKED)
>  - java.io.ExpiringCache.get(java.lang.String) @bci=0, line=75 (Compiled 
> frame)
>  - java.io.UnixFileSystem.canonicalize(java.lang.String) @bci=17, line=152 
> (Compiled frame)
>  - java.io.File.getCanonicalPath() @bci=27, line=6

Re: cassandra 3.11.4 deadlock

2020-06-23 Thread John Doe
actory$$Lambda$6.run() @bci=4 
(Compiled frame)
 - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
 
 
 
 
 
 
 

Sent: Tuesday, June 23, 2020 at 3:56 PM
From: "Jeff Jirsa" 
To: "cassandra" 
Subject: Re: cassandra 3.11.4 deadlock

Can you open a JIRA? 
 
https://issues.apache.org/jira/projects/CASSANDRA/issues 

On Tue, Jun 23, 2020 at 11:56 AM John Doe 
mailto:john@discofan.com]> wrote:We are running 
apache-cassandra-3.11.4, 10 node cluster with -Xms32G -Xmx32G -Xmn8G using CMS.
after running couple of days one of the node become unresponsive and threaddump 
(jstack -F) shows deadlock.

Found one Java-level deadlock:
=

"Native-Transport-Requests-144":
  waiting to lock Monitor@0x7cd5142e4d08 (Object@0x7f6e00348268, a 
java/io/ExpiringCache),
  which is held by "CompactionExecutor:115134"
"CompactionExecutor:115134":
  waiting to lock Monitor@0x7f6bcaf130f8 (Object@0x7f6dff31faa0, a 
ch/qos/logback/core/joran/spi/ConfigurationWatchList),
  which is held by "Native-Transport-Requests-144"

Found a total of 1 deadlock.

I have seen this couple of time now with different nodes with following in 
system.log

 IndexSummaryRedistribution.java:77 - Redistributing index summaries
 NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot 
allocate chunk of 1048576

also lookin in gc log there has not been a ParNew collection for last 10hrs, 
only CMS collections.

1739842.375: [GC (CMS Final Remark) [YG occupancy: 2712269 K (7549760 K)]
1739842.375: [Rescan (parallel) , 0.0614157 secs]
1739842.437: [weak refs processing, 0.994 secs]
1739842.437: [class unloading, 0.0231076 secs]
1739842.460: [scrub symbol table, 0.0061049 secs]
1739842.466: [scrub string table, 0.0043847 secs][1 CMS-remark: 
17696837K(25165824K)] 20409107K(32715584K), 0.0953750 secs] [Times: user=2.95 
sys=0.00, real=0.09 secs]
1739842.471: [CMS-concurrent-sweep-start]
1739848.572: [CMS-concurrent-sweep: 6.101/6.101 secs] [Times: user=6.13 
sys=0.00, real=6.10 secs]
1739848.573: [CMS-concurrent-reset-start]
1739848.645: [CMS-concurrent-reset: 0.072/0.072 secs] [Times: user=0.08 
sys=0.00, real=0.08 secs]
1739858.653: [GC (CMS Initial Mark) [1 CMS-initial-mark: 17696837K(25165824K)] 
20409111K(32715584K), 0.0584838 secs] [Times: user=2.68 sys=0.00, real=0.06 
secs]
1739858.713: [CMS-concurrent-mark-start]
1739860.496: [CMS-concurrent-mark: 1.784/1.784 secs] [Times: user=84.77 
sys=0.00, real=1.79 secs]
1739860.497: [CMS-concurrent-preclean-start]
1739860.566: [CMS-concurrent-preclean: 0.070/0.070 secs] [Times: user=0.07 
sys=0.00, real=0.07 secs]
1739860.567: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time
1739866.333: [CMS-concurrent-abortable-preclean: 5.766/5.766 secs] [Times: 
user=5.80 sys=0.00, real=5.76 secs]

Java HotSpot(TM) 64-Bit Server VM (25.162-b12) for linux-amd64 JRE 
(1.8.0_162-b12)
Memory: 4k page, physical 792290076k(2780032k free), swap 16777212k(16693756k 
free)

CommandLine flags:
-XX:+AlwaysPreTouch
-XX:CICompilerCount=15
-XX:+CMSClassUnloadingEnabled
-XX:+CMSEdenChunksRecordAlways
-XX:CMSInitiatingOccupancyFraction=40
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSParallelRemarkEnabled
-XX:CMSWaitDuration=1
-XX:ConcGCThreads=50
-XX:+CrashOnOutOfMemoryError
-XX:GCLogFileSize=10485760
-XX:+HeapDumpOnOutOfMemoryError
-XX:InitialHeapSize=34359738368
-XX:InitialTenuringThreshold=1
-XX:+ManagementServer
-XX:MaxHeapSize=34359738368
-XX:MaxNewSize=8589934592
-XX:MaxTenuringThreshold=1
-XX:MinHeapDeltaBytes=196608
-XX:NewSize=8589934592
-XX:NumberOfGCLogFiles=10
-XX:OldPLABSize=16
-XX:OldSize=25769803776
-XX:OnOutOfMemoryError=kill -9 %p
-XX:ParallelGCThreads=50
-XX:+PerfDisableSharedMem
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+ResizeTLAB
-XX:StringTableSize=103
-XX:SurvivorRatio=8
-XX:ThreadPriorityPolicy=42
-XX:ThreadStackSize=256
-XX:-UseBiasedLocking
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseCondCardMark
-XX:+UseFastUnorderedTimeStamps
-XX:+UseGCLogFileRotation
-XX:+UseNUMA
-XX:+UseNUMAInterleaving
-XX:+UseParNewGC
-XX:+UseTLAB
-XX:+UseThreadPriorities

--Thanks

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org[mailto:user-unsubscr...@cassandra.apache.org]
For additional commands, e-mail: 
user-h...@cassandra.apache.org[mailto:user-h...@cassandra.apache.org]
 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: cassandra 3.11.4 deadlock

2020-06-23 Thread Jeff Jirsa
Can you open a JIRA?

https://issues.apache.org/jira/projects/CASSANDRA/issues

On Tue, Jun 23, 2020 at 11:56 AM John Doe  wrote:

> We are running apache-cassandra-3.11.4, 10 node cluster with -Xms32G
> -Xmx32G -Xmn8G using CMS.
> after running couple of days one of the node become unresponsive and
> threaddump (jstack -F) shows deadlock.
>
> Found one Java-level deadlock:
> =
>
> "Native-Transport-Requests-144":
>   waiting to lock Monitor@0x7cd5142e4d08 (Object@0x7f6e00348268,
> a java/io/ExpiringCache),
>   which is held by "CompactionExecutor:115134"
> "CompactionExecutor:115134":
>   waiting to lock Monitor@0x7f6bcaf130f8 (Object@0x7f6dff31faa0,
> a ch/qos/logback/core/joran/spi/ConfigurationWatchList),
>   which is held by "Native-Transport-Requests-144"
>
> Found a total of 1 deadlock.
>
> I have seen this couple of time now with different nodes with following in
> system.log
>
>  IndexSummaryRedistribution.java:77 - Redistributing index summaries
>  NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot
> allocate chunk of 1048576
>
> also lookin in gc log there has not been a ParNew collection for last
> 10hrs, only CMS collections.
>
> 1739842.375: [GC (CMS Final Remark) [YG occupancy: 2712269 K (7549760 K)]
> 1739842.375: [Rescan (parallel) , 0.0614157 secs]
> 1739842.437: [weak refs processing, 0.994 secs]
> 1739842.437: [class unloading, 0.0231076 secs]
> 1739842.460: [scrub symbol table, 0.0061049 secs]
> 1739842.466: [scrub string table, 0.0043847 secs][1 CMS-remark:
> 17696837K(25165824K)] 20409107K(32715584K), 0.0953750 secs] [Times:
> user=2.95 sys=0.00, real=0.09 secs]
> 1739842.471: [CMS-concurrent-sweep-start]
> 1739848.572: [CMS-concurrent-sweep: 6.101/6.101 secs] [Times: user=6.13
> sys=0.00, real=6.10 secs]
> 1739848.573: [CMS-concurrent-reset-start]
> 1739848.645: [CMS-concurrent-reset: 0.072/0.072 secs] [Times: user=0.08
> sys=0.00, real=0.08 secs]
> 1739858.653: [GC (CMS Initial Mark) [1 CMS-initial-mark:
> 17696837K(25165824K)] 20409111K(32715584K), 0.0584838 secs] [Times:
> user=2.68 sys=0.00, real=0.06 secs]
> 1739858.713: [CMS-concurrent-mark-start]
> 1739860.496: [CMS-concurrent-mark: 1.784/1.784 secs] [Times: user=84.77
> sys=0.00, real=1.79 secs]
> 1739860.497: [CMS-concurrent-preclean-start]
> 1739860.566: [CMS-concurrent-preclean: 0.070/0.070 secs] [Times: user=0.07
> sys=0.00, real=0.07 secs]
> 1739860.567: [CMS-concurrent-abortable-preclean-start]
> CMS: abort preclean due to time
> 1739866.333: [CMS-concurrent-abortable-preclean: 5.766/5.766 secs] [Times:
> user=5.80 sys=0.00, real=5.76 secs]
>
> Java HotSpot(TM) 64-Bit Server VM (25.162-b12) for linux-amd64 JRE
> (1.8.0_162-b12)
> Memory: 4k page, physical 792290076k(2780032k free), swap
> 16777212k(16693756k free)
>
> CommandLine flags:
> -XX:+AlwaysPreTouch
> -XX:CICompilerCount=15
> -XX:+CMSClassUnloadingEnabled
> -XX:+CMSEdenChunksRecordAlways
> -XX:CMSInitiatingOccupancyFraction=40
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSParallelRemarkEnabled
> -XX:CMSWaitDuration=1
> -XX:ConcGCThreads=50
> -XX:+CrashOnOutOfMemoryError
> -XX:GCLogFileSize=10485760
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:InitialHeapSize=34359738368
> -XX:InitialTenuringThreshold=1
> -XX:+ManagementServer
> -XX:MaxHeapSize=34359738368
> -XX:MaxNewSize=8589934592
> -XX:MaxTenuringThreshold=1
> -XX:MinHeapDeltaBytes=196608
> -XX:NewSize=8589934592
> -XX:NumberOfGCLogFiles=10
> -XX:OldPLABSize=16
> -XX:OldSize=25769803776
> -XX:OnOutOfMemoryError=kill -9 %p
> -XX:ParallelGCThreads=50
> -XX:+PerfDisableSharedMem
> -XX:+PrintGC
> -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps
> -XX:+ResizeTLAB
> -XX:StringTableSize=103
> -XX:SurvivorRatio=8
> -XX:ThreadPriorityPolicy=42
> -XX:ThreadStackSize=256
> -XX:-UseBiasedLocking
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseConcMarkSweepGC
> -XX:+UseCondCardMark
> -XX:+UseFastUnorderedTimeStamps
> -XX:+UseGCLogFileRotation
> -XX:+UseNUMA
> -XX:+UseNUMAInterleaving
> -XX:+UseParNewGC
> -XX:+UseTLAB
> -XX:+UseThreadPriorities
>
> --Thanks
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


cassandra 3.11.4 deadlock

2020-06-23 Thread John Doe
We are running apache-cassandra-3.11.4, 10 node cluster with -Xms32G -Xmx32G 
-Xmn8G using CMS.
after running couple of days one of the node become unresponsive and threaddump 
(jstack -F) shows deadlock.

Found one Java-level deadlock:
=

"Native-Transport-Requests-144":
  waiting to lock Monitor@0x7cd5142e4d08 (Object@0x7f6e00348268, a 
java/io/ExpiringCache),
  which is held by "CompactionExecutor:115134"
"CompactionExecutor:115134":
  waiting to lock Monitor@0x7f6bcaf130f8 (Object@0x7f6dff31faa0, a 
ch/qos/logback/core/joran/spi/ConfigurationWatchList),
  which is held by "Native-Transport-Requests-144"

Found a total of 1 deadlock.

I have seen this couple of time now with different nodes with following in 
system.log

 IndexSummaryRedistribution.java:77 - Redistributing index summaries
 NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot 
allocate chunk of 1048576

also lookin in gc log there has not been a ParNew collection for last 10hrs, 
only CMS collections.

1739842.375: [GC (CMS Final Remark) [YG occupancy: 2712269 K (7549760 K)]
1739842.375: [Rescan (parallel) , 0.0614157 secs]
1739842.437: [weak refs processing, 0.994 secs]
1739842.437: [class unloading, 0.0231076 secs]
1739842.460: [scrub symbol table, 0.0061049 secs]
1739842.466: [scrub string table, 0.0043847 secs][1 CMS-remark: 
17696837K(25165824K)] 20409107K(32715584K), 0.0953750 secs] [Times: user=2.95 
sys=0.00, real=0.09 secs]
1739842.471: [CMS-concurrent-sweep-start]
1739848.572: [CMS-concurrent-sweep: 6.101/6.101 secs] [Times: user=6.13 
sys=0.00, real=6.10 secs]
1739848.573: [CMS-concurrent-reset-start]
1739848.645: [CMS-concurrent-reset: 0.072/0.072 secs] [Times: user=0.08 
sys=0.00, real=0.08 secs]
1739858.653: [GC (CMS Initial Mark) [1 CMS-initial-mark: 17696837K(25165824K)] 
20409111K(32715584K), 0.0584838 secs] [Times: user=2.68 sys=0.00, real=0.06 
secs]
1739858.713: [CMS-concurrent-mark-start]
1739860.496: [CMS-concurrent-mark: 1.784/1.784 secs] [Times: user=84.77 
sys=0.00, real=1.79 secs]
1739860.497: [CMS-concurrent-preclean-start]
1739860.566: [CMS-concurrent-preclean: 0.070/0.070 secs] [Times: user=0.07 
sys=0.00, real=0.07 secs]
1739860.567: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time
1739866.333: [CMS-concurrent-abortable-preclean: 5.766/5.766 secs] [Times: 
user=5.80 sys=0.00, real=5.76 secs]

Java HotSpot(TM) 64-Bit Server VM (25.162-b12) for linux-amd64 JRE 
(1.8.0_162-b12)
Memory: 4k page, physical 792290076k(2780032k free), swap 16777212k(16693756k 
free)

CommandLine flags:
-XX:+AlwaysPreTouch
-XX:CICompilerCount=15
-XX:+CMSClassUnloadingEnabled
-XX:+CMSEdenChunksRecordAlways
-XX:CMSInitiatingOccupancyFraction=40
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSParallelRemarkEnabled
-XX:CMSWaitDuration=1
-XX:ConcGCThreads=50
-XX:+CrashOnOutOfMemoryError
-XX:GCLogFileSize=10485760
-XX:+HeapDumpOnOutOfMemoryError
-XX:InitialHeapSize=34359738368
-XX:InitialTenuringThreshold=1
-XX:+ManagementServer
-XX:MaxHeapSize=34359738368
-XX:MaxNewSize=8589934592
-XX:MaxTenuringThreshold=1
-XX:MinHeapDeltaBytes=196608
-XX:NewSize=8589934592
-XX:NumberOfGCLogFiles=10
-XX:OldPLABSize=16
-XX:OldSize=25769803776
-XX:OnOutOfMemoryError=kill -9 %p
-XX:ParallelGCThreads=50
-XX:+PerfDisableSharedMem
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+ResizeTLAB
-XX:StringTableSize=103
-XX:SurvivorRatio=8
-XX:ThreadPriorityPolicy=42
-XX:ThreadStackSize=256
-XX:-UseBiasedLocking
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseCondCardMark
-XX:+UseFastUnorderedTimeStamps
-XX:+UseGCLogFileRotation
-XX:+UseNUMA
-XX:+UseNUMAInterleaving
-XX:+UseParNewGC
-XX:+UseTLAB
-XX:+UseThreadPriorities

--Thanks

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org