[
https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739100#comment-16739100
]
Jay Zhuang commented on CASSANDRA-14525:
----------------------------------------
I'm sorry for not committing the dtest change. Will do that ASAP (I'm still
trying to confirm a few flaky tests are not introduced by the changes).
> streaming failure during bootstrap makes new node into inconsistent state
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-14525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Core
> Reporter: Jaydeepkumar Chovatia
> Assignee: Jaydeepkumar Chovatia
> Priority: Major
> Fix For: 2.2.14, 3.0.18, 3.11.4, 4.0
>
>
> If bootstrap fails for newly joining node (most common reason is due to
> streaming failure) then Cassandra state remains in {{joining}} state which is
> fine but Cassandra also enables Native transport which makes overall state
> inconsistent. This further creates NullPointer exception if auth is enabled
> on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException:
> org.apache.cassandra.streaming.StreamException: Stream failed
> at
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
> ~[guava-18.0.jar:na]
> at
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
> ~[guava-18.0.jar:na]
> at
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
> ~[guava-18.0.jar:na]
> at
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256)
> [apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
> [apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:660)
> [apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:573)
> [apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330)
> [apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
> [apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695)
> [apache-cassandra-3.0.16.jar:3.0.16]
> Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
> at
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
> ~[apache-cassandra-3.0.16.jar:3.0.16]
> at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
> ~[guava-18.0.jar:na]
> at
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
> ~[guava-18.0.jar:na]
> at
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
> ~[guava-18.0.jar:na]
> at
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
> ~[guava-18.0.jar:na]
> at
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
> ~[guava-18.0.jar:na]
> at
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
> ~[apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
> ~[apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
> ~[apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540)
> ~[apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307)
> ~[apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> ~[apache-cassandra-3.0.16.jar:3.0.16]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
> will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not
> call [StorageService.java::finishJoiningRing
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933]
> and as a result
> [StorageService.java::doAuthSetup|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L999]
> will not be invoked.
> API [StorageService.java::joinTokenRing
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L763]
> returns without any problem. After this
> [CassandraDaemon.java::start|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L584]
> is invoked which starts native transport at
> [CassandraDaemon.java::startNativeTransport
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L478]
> At this point daemon’s bootstrap is still not finished and transport is
> enabled. So client will connect to the node and will encounter
> {{java.lang.NullPointerException}} as following:
> {quote}ERROR [SharedPool-Worker-2] Message.java:647 - Unexpected exception
> during request; channel = [id: 0x412a26b3, L:/a.b.c.d:9042 - R:/p.q.r.s:20121]
> java.lang.NullPointerException: null
> at
> org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:160)
> ~[apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:82)
> ~[apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54)
> ~[apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:198)
> ~[apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78)
> ~[apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:535)
> [apache-cassandra-3.0.16.jar:3.0.16]
> at
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:429)
> [apache-cassandra-3.0.16.jar:3.0.16]
> at
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
> at
> io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83)
> [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
> at
> io.netty.channel.DefaultChannelHandlerInvoker$7.run(DefaultChannelHandlerInvoker.java:159)
> [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_121]
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
> [apache-cassandra-3.0.16.jar:3.0.16]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [apache-cassandra-3.0.16.jar:3.0.16]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {quote}
> At this point if we run {{nodetool status}} then it will show this new node
> in {{UJ}} state, however clients can connect to this node over {{CQL}} and
> will receive {{java.lang.NullPointerException}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]