[ 
https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302740#comment-15302740
 ] 

Paulo Motta commented on CASSANDRA-10992:
-----------------------------------------

>From the thread dump it seems the stream session is hanged on 
>{{StreamReader.drain}}, more specifically trying to do 
>{{CompressedInputStream.read}} which blocks forever on {{Queue.take()}}:

{noformat}
Thread 16969: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may 
be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
line=175 (Compiled frame)
 - 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() 
@bci=42, line=2039 (Compiled frame)
 - java.util.concurrent.ArrayBlockingQueue.take() @bci=20, line=403 (Compiled 
frame)
 - org.apache.cassandra.streaming.compress.CompressedInputStream.read() 
@bci=31, line=95 (Compiled frame)
 - java.io.InputStream.read(byte[], int, int) @bci=43, line=170 (Compiled frame)
 - java.io.InputStream.skip(long) @bci=44, line=224 (Interpreted frame)
 - org.apache.cassandra.streaming.StreamReader.drain(java.io.InputStream, long) 
@bci=11, line=158 (Interpreted frame)
 - 
org.apache.cassandra.streaming.compress.CompressedStreamReader.read(java.nio.channels.ReadableByteChannel)
 @bci=577, line=129 (Compiled frame)
 - 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(java.nio.channels.ReadableByteChannel,
 int, org.apache.cassandra.streaming.StreamSession) @bci=64, line=48 (Compiled 
frame)
 - 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(java.nio.channels.ReadableByteChannel,
 int, org.apache.cassandra.streaming.StreamSession) @bci=4, line=38 (Compiled 
frame)
 - 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(java.nio.channels.ReadableByteChannel,
 int, org.apache.cassandra.streaming.StreamSession) @bci=41, line=56 (Compiled 
frame)
 - 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run() 
@bci=24, line=257 (Compiled frame)
 - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
{noformat}

Compressed input stream works with an auxiliary thread that reads compressed 
chunks from the socket stream and adds that to a data buffer queue that is 
consumed from {{CompressedStreamReader}} during reads. If there is an exception 
reading from the socket, the reader thread adds a poison pill to the data 
buffer queue that throws an {{IOException}} on next read. Upon receiving an 
exception on read {{CompressedStreamReader}} tries to drain the socket, which 
performs an additional read on the data buffer queue that is empty and blocks 
forever, causing the stream session to hang.

>From my understanding, we only drain the socket to perform stream session 
>retry later. But since we never retry on {{IOException}}, we shouldn't try to 
>drain the socket when getting an {{IOException}} on {{CompressedInputStream}}. 
>WDYT [~yukim]?

We should perhaps go further in a separate ticket and reconsider the stream 
retry mechanism, is there any situation where retry is working?

[~mlowicki] do you see any {{Error while reading compressed input stream}} or 
{{Error while reading partition}} warning in the system.log?

> Hanging streaming sessions
> --------------------------
>
>                 Key: CASSANDRA-10992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10992
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: C* 2.1.12, Debian Wheezy
>            Reporter: mlowicki
>            Assignee: Paulo Motta
>             Fix For: 2.1.12
>
>         Attachments: apache-cassandra-2.1.12-SNAPSHOT.jar, db1.ams.jstack, 
> db6.analytics.jstack
>
>
> I've started recently running repair using [Cassandra 
> Reaper|https://github.com/spotify/cassandra-reaper]  (built-in {{nodetool 
> repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've 
> noticed hanging streaming sessions:
> {code}
> root@db1:~# date
> Sat Jan  9 16:43:00 UTC 2016
> root@db1:~# nt netstats -H | grep total
>         Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB 
> total
>         Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total
>         Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB 
> total
>         Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total
>         Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB 
> total
>         Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total
>         Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 
> MB total
>         Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total
>         Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB 
> total
>         Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB 
> total
> root@db1:~# date
> Sat Jan  9 17:45:42 UTC 2016
> root@db1:~# nt netstats -H | grep total
>         Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB 
> total
>         Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total
>         Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB 
> total
>         Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total
>         Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB 
> total
>         Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total
>         Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 
> MB total
>         Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total
>         Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB 
> total
>         Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB 
> total
> {code}
> Such sessions are left even when repair job is long time done (confirmed by 
> checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} 
> in cassandra.yaml is set to default value (3600000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to