[
https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stu Hood updated CASSANDRA-1190:
--------------------------------
Comment: was deleted
(was: 0001 through 0003 remove automatic repairs without changing the network
format.
0004 adds a session id to the network format to allow for concurrent repairs
(considering they can take many hours to complete, and we don't want trees
generated at different times to collide).
----
0001 through 0003 could be applied to 0.6, but without a column family argument
to StreamIn.requestRanges (see my comment on CASSANDRA-1189), more data will be
transferred than necessary.)
> Remove automatic repair sessions
> --------------------------------
>
> Key: CASSANDRA-1190
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
> Project: Cassandra
> Issue Type: Bug
> Reporter: Stu Hood
> Assignee: Stu Hood
> Priority: Critical
> Fix For: 0.6.3, 0.7
>
> Attachments:
> 0001-Remove-natural-repair-throttling-in-preparation-for-.patch,
> 0002-Rename-readonly-compaction-to-validation-and-make-it.patch,
> 0003-Request-ranges-in-addition-to-sending-them.patch,
> 0004-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout
> value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a
> maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing
> major compactions around the same time), you want a relatively low
> TREE_STORE_TIMEOUT value, because trees generated a long time apart will
> cause a lot of unnecessary repair. The current value is 10 minutes, to
> optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be
> significantly higher. For instance, if a manual repair is triggered for a
> source node A storing 2 TB of data, and a destination node B with an empty
> store, then node B needs to wait long enough for node A to finish compacting
> 2 TB of data, which might take > 12 hours. If a node B times out the local
> tree before node A sends its tree, then the repair will not occur.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.