[ 
https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated CASSANDRA-1169:
---------------------------------------

    Attachment: aes.txt

I have upgraded to 6.2 because 6.1 streaming would randomly timeout on me. Now, 
I am still having issues with move, join, repair. Since I was having so many 
streaming problems I tuned this up in some logs. Over the past few weeks I have 
spent a lot of time managing my clusters, I try to do these type of operations 
in the AM so they are less performance impacting, but I have a very low sucess 
rate with any move,join,repair. I have a building list of nodes to join and 
ring management that I keep having to put off due to failures. So anything to 
make these processes less brittle would be a big big deal. Attached is ooutput.
 

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3
>
>         Attachments: aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time 
> for any nodes S and T.  For the original purpose of node movement, this was a 
> reasonable assumption (any node T can only perform one move at a time) but 
> AES throws off streaming tasks much more frequently than that given the right 
> conditions, which will de-sync the fragile file ordering that Streaming 
> assumes (that T knows which files S is going to send, in what order).  
> Eventually T is expecting file F1 but S sends a smaller file F2, leading to 
> an infinite loop on T while it waits for F1 to finish, and T waits for S to 
> acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its 
> streaming tasks to finish, before it allows itself to create another.  For 
> 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack 
> process seems very fragile, and poking around in parent objects via 
> inetaddress keys makes reasoning about small pieces impossible b/c of 
> encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to