[
https://issues.apache.org/jira/browse/CASSANDRA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027675#comment-15027675
]
Ariel Weisberg commented on CASSANDRA-10688:
--------------------------------------------
Near as I can tell the stack overflow is being used as a bound for something
that is walking an object graph looking for a path from the outgoing references
of an object to itself doing a depth first search. That isn't a stack trace
it's the graph that it walked (up until it overflowed). I suspect the overflow
is due to the depth of the graph since it's depth first and an any moderately
large linked list is going to overflow pretty quickly.
It's also using Stack which extends Vector which we should probably replace
with ArrayDeque.
This is debug code that only runs if {{-Dcassandra.debugrefcount=true}} so this
isn't an issue in production deployments. [~jjordan] any idea why that would be
set in your experiment?
For debug purposes the code works as designed and it can recover from the stack
overflow and continue searching the graph. It prunes the graph at the point
where the stack overflows. The only real issue is if the error is too noisy.
I think we might want to rate limit it using the first N entries in the graph
as a key. I'll put that together.
> Stack overflow from SSTableReader$InstanceTidier.runOnClose in Leak Detector
> ----------------------------------------------------------------------------
>
> Key: CASSANDRA-10688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10688
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jeremiah Jordan
> Assignee: Ariel Weisberg
> Fix For: 3.0.1, 3.1
>
>
> Running some tests against cassandra-3.0
> 9fc957cf3097e54ccd72e51b2d0650dc3e83eae0
> The tests are just running cassandra-stress write and read while adding and
> removing nodes from the cluster. After the test runs when I go back through
> logs I find the following Stackoverflow fairly often:
> ERROR [Strong-Reference-Leak-Detector:1] 2015-11-11 00:04:10,638
> Ref.java:413 - Stackoverflow [private java.lang.Runnable
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.runOnClose,
> final java.lang.Runnable
> org.apache.cassandra.io.sstable.format.SSTableReader$DropPageCache.andThen,
> final org.apache.cassandra.cache.InstrumentingCache
> org.apache.cassandra.io.sstable.SSTableRewriter$InvalidateKeys.cache, private
> final org.apache.cassandra.cache.ICache
> org.apache.cassandra.cache.InstrumentingCache.map, private final
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap
> org.apache.cassandra.cache.ConcurrentLinkedHashCache.map, final
> com.googlecode.concurrentlinkedhashmap.LinkedDeque
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.evictionDeque,
> com.googlecode.concurrentlinkedhashmap.Linked
> com.googlecode.concurrentlinkedhashmap.LinkedDeque.first,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> ....... (repeated a whole bunch more) ....
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next,
> final java.lang.Object
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.key,
> public final byte[] org.apache.cassandra.cache.KeyCacheKey.key
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)