[
https://issues.apache.org/jira/browse/CASSANDRA-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688866#comment-13688866
]
Jonathan Ellis commented on CASSANDRA-5668:
-------------------------------------------
Okay, here's what's happening:
{noformat}
INFO [Thrift:1] 2013-06-19 23:36:51,719 Tracing.java (line 176) session
0702a620-d963-11e2-832d-53376523a4a2 is complete
java.lang.AssertionError: Asked to trace TYPE:MUTATION VERB:MUTATION for
session 0702a620-d963-11e2-832d-53376523a4a2 but that state does not exist
{noformat}
cqlsh is requesting QUORUM CL (or ONE?) so once that's achieved the coordinator
sends success to the client and closes the tracing session.
if other messages have not yet gone out, then we error.
But it gets worse...
Once the coordinator's state is discarded, any late-arriving replies will
create a new, "non-local" session. Since the coordinator will not send any
messages again for this session -- which is the trigger we use on replicas to
indicate "we're done" -- the nonlocal session will persist indefinitely,
"leaking" memory.
I think we can solve both of these:
# Make a static TraceState method that only needs the sessionid to be passed in
to log an event. OTC can use this to avoid having to look up tracestate at
all; if it's cleared out, not a problem.
# Make Tracing.sessions an expiring map so sessions we don't clean up manually
still get removed
Alternatively we could just go with #2 by itself and not try to cleanup
manually at all. Average case memory used will be worse, but maybe that is
okay since we assume only a tiny fraction of requests are traced at all.
What do you think [~slebresne]?
> NPE in net.OutputTcpConnection when tracing is enabled
> ------------------------------------------------------
>
> Key: CASSANDRA-5668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5668
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 1.2.6, 2.0 beta 1
> Reporter: Ryan McGuire
> Attachments: 5668-assert-2.txt, 5668-assert.txt, 5668-logs.tar.gz,
> 5668_npe_ddl.cql, 5668_npe_insert.cql, system.log
>
>
> I get multiple NullPointerException when trying to trace INSERT statements.
> To reproduce:
> {code}
> $ ccm create -v git:trunk
> $ ccm populate -n 3
> $ ccm start
> $ ccm node1 cqlsh < 5668_npe_ddl.cql
> $ ccm node1 cqlsh < 5668_npe_insert.cql
> {code}
> And see many exceptions like this in the logs of node1:
> {code}
> ERROR [WRITE-/127.0.0.3] 2013-06-19 14:54:35,885 OutboundTcpConnection.java
> (line 197) error writing to /127.0.0.3
> java.lang.NullPointerException
> at
> org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:182)
> at
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:144)
> {code}
> This is similar to CASSANDRA-5658 and is the reason that npe_ddl and
> npe_insert are separate files.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira