[
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038181#comment-13038181
]
Stu Hood commented on CASSANDRA-2433:
-------------------------------------
0001
* Since we're not trying to control throughput or monitor sessions, could we
just use Stage.MISC?
0002
* I think RepairSession.exception needs to be volatile to ensure that the
awoken thread sees it
* Would it be better if RepairSession implemented
IEndpointStateChangeSubscriber directly?
* The endpoint set needs to be threadsafe, since it will be modified by the
endpoint state change thread, and the AE_STAGE thread
0003
* Should StreamInSession.retries be volatile/atomic? (likely they won't retry
quickly enough for it to be a problem, but...)
0004
* Playing devil's advocate: would sending a half-built tree in case of failure
still be useful?
Thanks Sylvain!
> Failed Streams Break Repair
> ---------------------------
>
> Key: CASSANDRA-2433
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Benjamin Coverston
> Assignee: Sylvain Lebresne
> Labels: repair
> Fix For: 0.8.1
>
> Attachments:
> 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v2.patch,
> 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re.patch,
> 0002-Register-in-gossip-to-handle-node-failures-v2.patch,
> 0002-Register-in-gossip-to-handle-node-failures.patch,
> 0003-Report-streaming-errors-back-to-repair-v2.patch,
> 0003-Report-streaming-errors-back-to-repair.patch,
> 0004-Reports-validation-compaction-errors-back-to-repair-v2.patch,
> 0004-Reports-validation-compaction-errors-back-to-repair.patch
>
>
> Running repair in cases where a stream fails we are seeing multiple problems.
> 1. Although retry is initiated and completes, the old stream doesn't seem to
> clean itself up and repair hangs.
> 2. The temp files are left behind and multiple failures can end up filling up
> the data partition.
> These issues together are making repair very difficult for nearly everyone
> running repair on a non-trivial sized data set.
> This issue is also being worked on w.r.t CASSANDRA-2088, however that was
> moved to 0.8 for a few reasons. This ticket is to fix the immediate issues
> that we are seeing in 0.7.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira