[
https://issues.apache.org/jira/browse/CASSANDRA-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673426#comment-13673426
]
Jason Brown commented on CASSANDRA-5498:
----------------------------------------
[~jjordan] working on it now on #cassandra-dev IRC. My suspicion is a problem
with Gossiper.addSavedEndopint(), which clears out the endpoint's previous data
from the endpointStateMap when a node with a greater messaging version attempts
to connect. Which then causes the downstream affect in DSWRH when it requests
the DC data from the EC2Snitch, which gets it from Gossiper.endopintStateMap.
Here's the server-side stacktrace:
{code}ERROR [RPC-Thread:150339] 2013-05-08 17:29:55,048 Cassandra.java (line
3462) Internal error processing batch_mutate
java.lang.NullPointerException
at
org.apache.cassandra.service.DatacenterSyncWriteResponseHandler.assureSufficientLiveNodes(DatacenterSyncWriteResponseHandler.java:109)
at
org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:253)
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:194)
at
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:639)
at
org.apache.cassandra.thrift.CassandraServer.internal_batch_mutate(CassandraServer.java:590)
at
org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:598)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3454)
at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at
org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)
at
org.apache.cassandra.thrift.CustomTHsHaServer$Invocation.run(CustomTHsHaServer.java:105)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662){code}
> Possible NPE on EACH_QUORUM writes
> ----------------------------------
>
> Key: CASSANDRA-5498
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5498
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.1.10
> Reporter: Jason Brown
> Assignee: Jason Brown
> Priority: Minor
> Labels: each_quorum, ec2
> Fix For: 1.2.6
>
> Attachments: 5498-v1.patch, 5498-v2.patch
>
>
> When upgrading from 1.0 to 1.1, we observed that
> DatacenterSyncWriteResponseHandler.assureSufficientLiveNodes() can throw an
> NPE if one of the writeEndpoints has a DC that is not listed in the keyspace
> while one of the nodes is down. We observed this while running in EC2, and
> using the Ec2Snitch. The exception typically was was brief, but a certain
> segment of writes (using EACH_QUORUM) failed during that time.
> This ticket will address the NPE in DSWRH, while a followup ticket will be
> created once we get to the bottom of the incorrect DC being reported from
> Ec2Snitch.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira