[
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726641#comment-13726641
]
Benedict commented on CASSANDRA-2698:
-------------------------------------
Good points.
As to the changes to differenceHelper(), the row count permits not breaking up
contiguous ranges of differences that happen to be separated by unpopulated
leaves (using just the hash to determine if the data was populated I realised
was dangerous, as you cannot disambiguate between no rows and a non-zero number
of empty rows), which in my previous patch was generating a lot of ugly log
messages. After sending my patch last night I must admit I began to doubt the
sense of keeping the changes in, and was probably the hangover of wanting to
retain what I could from the previous patch. I think "kill your babies" is the
mantra to apply here, as it doesn't serve any purpose at the moment, and if we
don't intend to send counts over the wire would be actively dangerous.
I'll strip out those changes, modify the messages and and fire over another
patch.
That said, I'd prefer to emit the lower bound as well so we know the starting
point; "~100: xxx" doesn't tell you if the distribution is 0-100, or 99-100,
which might be useful information. This is only helpful for the first item, so
could emit only for that, but for neatness I'd probably retain it for all;
since we're dealing with integers there's an easy fix of just bumping both
start/end by 1 and swapping the brackets.
> Instrument repair to be able to assess it's efficiency (precision)
> ------------------------------------------------------------------
>
> Key: CASSANDRA-2698
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Sylvain Lebresne
> Assignee: Benedict
> Priority: Minor
> Labels: lhf
> Attachments: nodetool_repair_and_cfhistogram.tar.gz,
> patch_2698_v1.txt, patch.diff, patch-rebased.diff, patch.taketwo.alpha.diff,
> patch.taketwo.forreview.diff
>
>
> Some reports indicate that repair sometime transfer huge amounts of data. One
> hypothesis is that the merkle tree precision may deteriorate too much at some
> data size. To check this hypothesis, it would be reasonably to gather
> statistic during the merkle tree building of how many rows each merkle tree
> range account for (and the size that this represent). It is probably an
> interesting statistic to have anyway.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira