[
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890438#comment-13890438
]
Yuki Morishita edited comment on CASSANDRA-5351 at 2/4/14 6:20 AM:
-------------------------------------------------------------------
bq. Dropping sstable to UNREPAIRED during major compaction means that all
repaired data status is cleared for the node.
That's what I meant. Current major compaction produces one SSTable and I think
changing that behavior would confuse users, maybe. My opinion is to keep it as
is .
Additional review comments:
* Does PrepareMessage needs to carry around dataCenters? Only coordinator sends
out messages so I think you can drop it(also from ParentRepairSession).
* CF ID is preferred to use over Keyspace name/CF name pair.
* PrepareMessage is sent per CF but it can produce a lot of round trip. Isn't
one message per replica node enough?
* I think we need clean up for parentRepairSessions when something bad
happened. Otherwise ParentRepairSession in the map keep reference to SSTables.
I just worked on the first one above and the commit is here(on top of your
branch):
https://github.com/yukim/cassandra/commit/7c65e532dd69f9f4c1ea2d3fdf0401ed70291361
was (Author: yukim):
bq. Dropping sstable to UNREPAIRED during major compaction means that all
repaired data status is cleared for the node.
That's what I meant. Current major compaction produces one SSTable and I think
changing that behavior would confuse users, maybe. My opinion is to keep it as
is, but .
Additional review comments:
* Does PrepareMessage needs to carry around dataCenters? Only coordinator sends
out messages so I think you can drop it(also from ParentRepairSession).
* CF ID is preferred to use over Keyspace name/CF name pair.
* PrepareMessage is sent per CF but it can produce a lot of round trip. Isn't
one message per replica node enough?
* I think we need clean up for parentRepairSessions when something bad
happened. Otherwise ParentRepairSession in the map keep reference to SSTables.
I just worked on the first one above and the commit is here(on top of your
branch):
https://github.com/yukim/cassandra/commit/7c65e532dd69f9f4c1ea2d3fdf0401ed70291361
> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
> Key: CASSANDRA-5351
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
> Project: Cassandra
> Issue Type: Task
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Lyuben Todorov
> Labels: repair
> Fix For: 2.1
>
> Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log,
> 5351_nodetool.log
>
>
> Repair has always built its merkle tree from all the data in a columnfamily,
> which is guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been
> successfully repaired, and only repairing sstables new since the last repair.
> (This automatically makes CASSANDRA-3362 much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired
> data together with non-repaired. So we should segregate unrepaired sstables
> from the repaired ones.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)