[jira] [Commented] (CASSANDRA-5498) Possible NPE on EACH_QUORUM writes

Jason Brown (JIRA) Mon, 03 Jun 2013 11:55:12 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673426#comment-13673426
 ]


Jason Brown commented on CASSANDRA-5498:
----------------------------------------

[~jjordan] working on it now on #cassandra-dev IRC. My suspicion is a problem 
with Gossiper.addSavedEndopint(), which clears out the endpoint's previous data 
from the endpointStateMap when a node with a greater messaging version attempts 
to connect. Which then causes the downstream affect in DSWRH when it requests 
the DC data from the EC2Snitch, which gets it from Gossiper.endopintStateMap.

Here's the server-side stacktrace:

{code}ERROR [RPC-Thread:150339] 2013-05-08 17:29:55,048 Cassandra.java (line 
3462) Internal error processing batch_mutate 
java.lang.NullPointerException 
at 
org.apache.cassandra.service.DatacenterSyncWriteResponseHandler.assureSufficientLiveNodes(DatacenterSyncWriteResponseHandler.java:109)
 
at 
org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:253) 
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:194) 
at 
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:639) 
at 
org.apache.cassandra.thrift.CassandraServer.internal_batch_mutate(CassandraServer.java:590)
 
at 
org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:598)
 
at 
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3454)
 
at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) 
at 
org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)
 
at 
org.apache.cassandra.thrift.CustomTHsHaServer$Invocation.run(CustomTHsHaServer.java:105)
 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
at java.lang.Thread.run(Thread.java:662){code}
                
> Possible NPE on EACH_QUORUM writes
> ----------------------------------
>
>                 Key: CASSANDRA-5498
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5498
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.10
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: each_quorum, ec2
>             Fix For: 1.2.6
>
>         Attachments: 5498-v1.patch, 5498-v2.patch
>
>
> When upgrading from 1.0 to 1.1, we observed that 
> DatacenterSyncWriteResponseHandler.assureSufficientLiveNodes() can throw an 
> NPE if one of the writeEndpoints has a DC that is not listed in the keyspace 
> while one of the nodes is down. We observed this while running in EC2, and 
> using the Ec2Snitch. The exception typically was was brief, but a certain 
> segment of writes (using EACH_QUORUM) failed during that time.
> This ticket will address the NPE in DSWRH, while a followup ticket will be 
> created once we get to the bottom of the incorrect DC being reported from 
> Ec2Snitch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5498) Possible NPE on EACH_QUORUM writes

Reply via email to