[ 
https://issues.apache.org/jira/browse/CASSANDRA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434171#comment-13434171
 ] 

Nicolas Favre-Felix commented on CASSANDRA-4482:
------------------------------------------------

Indeed, I've used "range" as Range<Token>, the range of tokens owned by a node; 
I should have made this clearer.
We are not using the MerkleTree class or its TreeRange objects, but updating a 
single ByteBuffer directly instead of creating the whole tree with its hundreds 
of internal objects. This is equivalent to updating the leaves alone, without 
propagating the hash upwards in the tree. Yes, that means comparing two trees 
is O(leaf count).

bq. I xor all these together to get my initial state, S. To update row A to row 
A', I need to take S xor hash(A) xor hash(A').

If you've lready xor'd all these together, S does include the hash of your 
existing row A. Updating A to A' hashes A' and returns S' = S xor hash(A'), 
which is hash(A') xor hash(A).

In practice, this is how it works step by step:

# Load existing buffers when the ColumnFamilyStore is created: per 
Range<Token>, load an existing buffer or create a new one initialized with 
zeros.
# ColumnFamilyStore.apply() is called with columns X and Y in row A. For 
instance, row A could have token 0x10, falling in the range (0x00, 0x20]. The 
incremental repair ByteBuffer for this range is 1 KB in size.
# Create a new digest and run Column.updateDigest() on X and Y sucessively. We 
end up with H = hash(X) xor hash(Y); H is 16 bytes long.
# Calculate O, the offset in the ByteBuffer that corresponds to H: in this 
case, it's around 512 since 0x10 is close to the middle of the range (0x00, 
0x20].
# For each byte i of H, we set buffer[O+i] = buffer[O+i] xor H[i].

During the repair session, the replicas send out their existing ByteBuffers for 
the range being repaired and replace them with empty ones that will receive 
subsequent inserts.

bq. And sync the BB saving with CF flushes so CL replay matches up, I imagine.

Yes. If you terminate Cassandra at this stage, the ByteBuffer is written to 
disk and will contains [0,0.... a few bytes of hash(X) xor hash(Y) around the 
middle ... 0,0,0,0].

                
> In-memory merkle trees for repair
> ---------------------------------
>
>                 Key: CASSANDRA-4482
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4482
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Marcus Eriksson
>
> this sounds cool, we should reimplement it in the open source cassandra;
> http://www.acunu.com/2/post/2012/07/incremental-repair.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to