[
https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Edward Capriolo updated CASSANDRA-1169:
---------------------------------------
Attachment: aes.txt
I have upgraded to 6.2 because 6.1 streaming would randomly timeout on me. Now,
I am still having issues with move, join, repair. Since I was having so many
streaming problems I tuned this up in some logs. Over the past few weeks I have
spent a lot of time managing my clusters, I try to do these type of operations
in the AM so they are less performance impacting, but I have a very low sucess
rate with any move,join,repair. I have a building list of nodes to join and
ring management that I keep having to put off due to failures. So anything to
make these processes less brittle would be a big big deal. Attached is ooutput.
> AES makes Streaming unhappy
> ---------------------------
>
> Key: CASSANDRA-1169
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Gary Dusbabek
> Priority: Critical
> Fix For: 0.6.3
>
> Attachments: aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time
> for any nodes S and T. For the original purpose of node movement, this was a
> reasonable assumption (any node T can only perform one move at a time) but
> AES throws off streaming tasks much more frequently than that given the right
> conditions, which will de-sync the fragile file ordering that Streaming
> assumes (that T knows which files S is going to send, in what order).
> Eventually T is expecting file F1 but S sends a smaller file F2, leading to
> an infinite loop on T while it waits for F1 to finish, and T waits for S to
> acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its
> streaming tasks to finish, before it allows itself to create another. For
> 0.7 it would be nice to make Streaming more robust. The whole 4-stage-ack
> process seems very fragile, and poking around in parent objects via
> inetaddress keys makes reasoning about small pieces impossible b/c of
> encapsulation violations.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.