[
https://issues.apache.org/jira/browse/CASSANDRA-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473341#comment-13473341
]
Nick Bailey commented on CASSANDRA-4756:
----------------------------------------
Ok, but Row 2 was replicated to nodes C and D.
We loaded the snapshot on C with --one-copy=1 so if I'm understanding this
correctly, that sent row 2 to D. So that means we need to load the snapshot on
D with --one-copy=0 to send that Row 2 to C.
Row 3 is stored on D and A though. We've loaded both of those snapshots with
--one-copy=0. So Row 3 only got sent to D.
Unless I'm not understanding something correctly.
> Bulk loading snapshots creates RF^2 copies of the data
> ------------------------------------------------------
>
> Key: CASSANDRA-4756
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4756
> Project: Cassandra
> Issue Type: Improvement
> Affects Versions: 1.2.0 beta 1
> Reporter: Nick Bailey
>
> Since a cluster snapshot will contain rf copies of each piece of data,
> bulkloading all of those snapshots will create rf^2 copies of each piece of
> data.
> Not sure what the solution here is. Ideally we would merge the RF copies of
> the data before sending to the cluster. This would solve any inconsistencies
> that existed when the snapshot was taken.
> A more naive approach of only loading one of the RF copies and assuming there
> are no inconsistencies might be an easier goal for the near term though.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira