[
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095348#comment-13095348
]
Sylvain Lebresne commented on CASSANDRA-2698:
---------------------------------------------
An EstimatedHistogram would be just fine. That plus for each pair of merkle
tree, the number of ranges that differs and the corresponding streamed size of
the data would be a very good start imho.
I think the only thing we need to figure out for this patch is where it makes
the most sense to record that data. What I mean here is that the merkle trees
are computed on each node participating in a repair (and thus that is where the
EstimatedHistogram can be computed), while the computing of the differences is
only done on the coordinator. But on an ideal world, it would seem more useful
to store those information together (for a given repair) because they are
related.
> Instrument repair to be able to assess it's efficiency (precision)
> ------------------------------------------------------------------
>
> Key: CASSANDRA-2698
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Sylvain Lebresne
> Priority: Minor
> Labels: lhf
>
> Some reports indicate that repair sometime transfer huge amounts of data. One
> hypothesis is that the merkle tree precision may deteriorate too much at some
> data size. To check this hypothesis, it would be reasonably to gather
> statistic during the merkle tree building of how many rows each merkle tree
> range account for (and the size that this represent). It is probably an
> interesting statistic to have anyway.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira