[
https://issues.apache.org/jira/browse/CASSANDRA-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485887#comment-14485887
]
Tyler Hobbs commented on CASSANDRA-9132:
----------------------------------------
While looking into failures on the similar
{{replace_address_test.TestReplaceAddress.resumable_replace_test}}, it looks
like the problem is that after the stream breaks, a {{Retry}} message is sent,
but it fails and the failure is swallowed by {{OutboundTcpConnection}}. Here
are the relevant debug-level logs from node4 (which is replacing node3; node1
is killed during streaming):
{noformat}
WARN [STREAM-IN-/127.0.0.1] 2015-04-08 14:39:22,312 StreamSession.java:
[Stream #f39354c0-de26-11e4-ae5c-6b09a6cc3d5a] Retrying for following error
java.io.IOError: java.io.IOException: EOF in 52430 byte (compressed) block:
could only read 12647 bytes
at org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:56)
~[main/:na]
at org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46)
~[main/:na]
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
~[guava-16.0.jar:na]
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
~[guava-16.0.jar:na]
at
org.apache.cassandra.io.sstable.format.big.BigTableWriter.appendFromStream(BigTableWriter.java:227)
~[main/:na]
at
org.apache.cassandra.streaming.StreamReader.writeRow(StreamReader.java:161)
~[main/:na]
at org.apache.cassandra.streaming.StreamReader.read(StreamReader.java:104)
~[main/:na]
at
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:48)
[main/:na]
at
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:38)
[main/:na]
at
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56)
[main/:na]
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:251)
[main/:na]
at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40]
Caused by: java.io.IOException: EOF in 52430 byte (compressed) block: could
only read 12647 bytes
at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:394)
~[compress-lzf-0.8.4.jar:na]
at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
~[compress-lzf-0.8.4.jar:na]
at
com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
~[compress-lzf-0.8.4.jar:na]
at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
~[compress-lzf-0.8.4.jar:na]
at java.io.DataInputStream.readFully(DataInputStream.java:195)
~[na:1.7.0_40]
at java.io.DataInputStream.readFully(DataInputStream.java:169)
~[na:1.7.0_40]
at
org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:94)
~[main/:na]
at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:360)
~[main/:na]
at
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:320)
~[main/:na]
at
org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:132)
~[main/:na]
at
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:86)
~[main/:na]
at org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52)
~[main/:na]
... 11 common frames omitted
DEBUG [STREAM-OUT-/127.0.0.1] 2015-04-08 14:39:22,313 ConnectionHandler.java:
[Stream #f39354c0-de26-11e4-ae5c-6b09a6cc3d5a] Sending Retry
(a6c9e410-de26-11e4-a645-6b09a6cc3d5a, #0)
DEBUG [WRITE-/127.0.0.1] 2015-04-08 14:39:23,315 OutboundTcpConnection.java:
error writing to /127.0.0.1
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.7.0_40]
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
~[na:1.7.0_40]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.7.0_40]
at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.7.0_40]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
~[na:1.7.0_40]
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
~[na:1.7.0_40]
at java.nio.channels.Channels.writeFully(Channels.java:98) ~[na:1.7.0_40]
at java.nio.channels.Channels.access$000(Channels.java:61) ~[na:1.7.0_40]
at java.nio.channels.Channels$1.write(Channels.java:174) ~[na:1.7.0_40]
at
net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:205)
~[lz4-1.3.0.jar:na]
at
net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:223)
~[lz4-1.3.0.jar:na]
at
org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66)
~[main/:na]
at
org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:289)
[main/:na]
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:219)
[main/:na]
DEBUG [WRITE-/127.0.0.1] 2015-04-08 14:39:23,316 OutboundTcpConnection.java:
attempting to connect to /127.0.0.1
{noformat}
After that, no more retry attempts are made.
> resumable_bootstrap_test can hang
> ---------------------------------
>
> Key: CASSANDRA-9132
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9132
> Project: Cassandra
> Issue Type: Bug
> Components: Tests
> Reporter: Tyler Hobbs
> Assignee: Yuki Morishita
>
> The {{bootstrap_test.TestBootstrap.resumable_bootstrap_test}} can hang
> sometimes. It looks like the following line never completes:
> {noformat}
> node3.watch_log_for("Listening for thrift clients...")
> {noformat}
> I'm not familiar enough with the recent bootstrap changes to know why that's
> not happening.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)