[ 
https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004707#comment-13004707
 ] 

Aaron Morton commented on CASSANDRA-2290:
-----------------------------------------

Not sure if this helps. I found a place where AES was hanging while testing 
failure during streaming transfer for CASSANDRA-2088. I broke the 
FileStresmTask to only send one range and close the sending channel. 

The  IncomingStreamReader.readFile() got stuck in an infinite loop because it 
does not check the return from FileChannel.transferFrom(). It was returning 0 
bytes read. Also the FileStreamTask does not check the bytes sent by 
transferTo()

While stuck in the loop the socket it was reading from was (127.0.0.1 was in 
the loop, .0.2 was sending) 
java      25371 aaron   73u  IPv4 0xffffff8010742ff8      0t0  TCP 
127.0.0.1:7000->127.0.0.2:52759 (CLOSE_WAIT)

When I was debugging the socketChannel was still reporting it was open. 

> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
>                 Key: CASSANDRA-2290
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 0.7.4
>
>         Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
>   # Repair don't check if a node is dead before sending a TreeRequest; this 
> is easily fixable.
>   # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably 
> CASSANDRA-1740 however. That is, if we add a way to query the state of a 
> repair, and that this query correctly check all neighbors and also add a way 
> to cancel a repair, this would probably be enough.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to