[ 
https://issues.apache.org/jira/browse/CASSANDRA-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688866#comment-13688866
 ] 

Jonathan Ellis commented on CASSANDRA-5668:
-------------------------------------------

Okay, here's what's happening:

{noformat}
 INFO [Thrift:1] 2013-06-19 23:36:51,719 Tracing.java (line 176) session 
0702a620-d963-11e2-832d-53376523a4a2 is complete

java.lang.AssertionError: Asked to trace TYPE:MUTATION VERB:MUTATION for 
session 0702a620-d963-11e2-832d-53376523a4a2 but that state does not exist
{noformat}

cqlsh is requesting QUORUM CL (or ONE?) so once that's achieved the coordinator 
sends success to the client and closes the tracing session.

if other messages have not yet gone out, then we error.

But it gets worse...

Once the coordinator's state is discarded, any late-arriving replies will 
create a new, "non-local" session.  Since the coordinator will not send any 
messages again for this session -- which is the trigger we use on replicas to 
indicate "we're done" -- the nonlocal session will persist indefinitely, 
"leaking" memory.

I think we can solve both of these:
# Make a static TraceState method that only needs the sessionid to be passed in 
to log an event.  OTC can use this to avoid having to look up tracestate at 
all; if it's cleared out, not a problem.
# Make Tracing.sessions an expiring map so sessions we don't clean up manually 
still get removed

Alternatively we could just go with #2 by itself and not try to cleanup 
manually at all.  Average case memory used will be worse, but maybe that is 
okay since we assume only a tiny fraction of requests are traced at all.

What do you think [~slebresne]?
                
> NPE in net.OutputTcpConnection when tracing is enabled
> ------------------------------------------------------
>
>                 Key: CASSANDRA-5668
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5668
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.6, 2.0 beta 1
>            Reporter: Ryan McGuire
>         Attachments: 5668-assert-2.txt, 5668-assert.txt, 5668-logs.tar.gz, 
> 5668_npe_ddl.cql, 5668_npe_insert.cql, system.log
>
>
> I get multiple NullPointerException when trying to trace INSERT statements.
> To reproduce:
> {code}
> $ ccm create -v git:trunk
> $ ccm populate -n 3
> $ ccm start
> $ ccm node1 cqlsh < 5668_npe_ddl.cql
> $ ccm node1 cqlsh < 5668_npe_insert.cql
> {code}
> And see many exceptions like this in the logs of node1:
> {code}
> ERROR [WRITE-/127.0.0.3] 2013-06-19 14:54:35,885 OutboundTcpConnection.java 
> (line 197) error writing to /127.0.0.3
> java.lang.NullPointerException
>         at 
> org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:182)
>         at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:144)
> {code}
> This is similar to CASSANDRA-5658 and is the reason that npe_ddl and 
> npe_insert are separate files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to