[ https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726641#comment-13726641 ]
Benedict commented on CASSANDRA-2698: ------------------------------------- Good points. As to the changes to differenceHelper(), the row count permits not breaking up contiguous ranges of differences that happen to be separated by unpopulated leaves (using just the hash to determine if the data was populated I realised was dangerous, as you cannot disambiguate between no rows and a non-zero number of empty rows), which in my previous patch was generating a lot of ugly log messages. After sending my patch last night I must admit I began to doubt the sense of keeping the changes in, and was probably the hangover of wanting to retain what I could from the previous patch. I think "kill your babies" is the mantra to apply here, as it doesn't serve any purpose at the moment, and if we don't intend to send counts over the wire would be actively dangerous. I'll strip out those changes, modify the messages and and fire over another patch. That said, I'd prefer to emit the lower bound as well so we know the starting point; "~100: xxx" doesn't tell you if the distribution is 0-100, or 99-100, which might be useful information. This is only helpful for the first item, so could emit only for that, but for neatness I'd probably retain it for all; since we're dealing with integers there's an easy fix of just bumping both start/end by 1 and swapping the brackets. > Instrument repair to be able to assess it's efficiency (precision) > ------------------------------------------------------------------ > > Key: CASSANDRA-2698 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2698 > Project: Cassandra > Issue Type: Improvement > Reporter: Sylvain Lebresne > Assignee: Benedict > Priority: Minor > Labels: lhf > Attachments: nodetool_repair_and_cfhistogram.tar.gz, > patch_2698_v1.txt, patch.diff, patch-rebased.diff, patch.taketwo.alpha.diff, > patch.taketwo.forreview.diff > > > Some reports indicate that repair sometime transfer huge amounts of data. One > hypothesis is that the merkle tree precision may deteriorate too much at some > data size. To check this hypothesis, it would be reasonably to gather > statistic during the merkle tree building of how many rows each merkle tree > range account for (and the size that this represent). It is probably an > interesting statistic to have anyway. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira