[
https://issues.apache.org/jira/browse/CASSANDRA-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843308#comment-13843308
]
Oliver Seiler commented on CASSANDRA-6349:
------------------------------------------
I suspect these changes introduced an infinite loop if the ServerSocket gets
closed (not sure how that is happening though). We've been seeing some major
problems with Cassandra 2.0.3 when a new cluster is coming up for the first
time, and it seems to be a result of this. With logging set to debug,
system.log is getting pummelled with these exception messages:
{noformat}
DEBUG [ACCEPT-localhost-grid/10.96.99.178] 2013-12-06 22:55:39,759
MessagingService.java (line 905) Error reading the socket null
java.net.SocketException: Socket closed
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
at java.net.ServerSocket.implAccept(Unknown Source)
at sun.security.ssl.SSLServerSocketImpl.accept(Unknown Source)
at
org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:865)
{noformat}
It looks like once in this state, nothing will break it out; prior to this
change the IOException catch block was throwing another exception, now it just
keeps looping, using the (seemingly closed) ServerSocket. Restarting Cassandra
seems to be the only way to resolve this. I'll probably be recommending we drop
back to 2.0.2 until this problem is fixed (or we can understand why the
ServerSocket is closed...)
> IOException in MessagingService.run() causes orphaned storage server socket
> ---------------------------------------------------------------------------
>
> Key: CASSANDRA-6349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6349
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: cassandra 2.0+
> Reporter: Steven Halaka
> Assignee: Mikhail Stepura
> Fix For: 2.0.3
>
> Attachments: CASSANDRA-2.0-6349.patch
>
>
> The refactoring of reading the message header in MessagingService.run() vs
> IncomingTcpConnection seems to mishandle IOException as the loop is broken
> and MessagingService.SocketThread never seems to get reinitialized.
> To reproduce: telnet to port 7000 and send random data. This then prevents
> any new or restarting node in the cluster from handshaking with this defunct
> storage port.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)