[
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626520#comment-13626520
]
Benedict commented on CASSANDRA-2698:
-------------------------------------
Hi Yuki,
Without in some way collecting (or at least sampling) the size of the
differences, I don't know what bucket sizes to use. Since I need to reinsert
all the records once I've decided this anyway, I need to retain them all, which
I chose to do in EstimatedHistogram as they do, in effect, constitute a
histogram. I also sample the largest records which I figure could be useful for
debugging purposes (though that was just a guess). I don't see why 1000s of
items is a major issue?
I agree that logging is suboptimal for this data. Presumably similar data for
other tasks may be optionally logged in future, and so I would guess this
should form part of a discussion about metric logging?
{quote}
fix coding style (especially whitespace) to match other code.
{quote}
Do you have an Eclipse formatter profile I could use for your coding
convention? I did my best to keep it correct manually, but it is difficult to
spot differences in an unfamiliar convention. Whitespace should be
comparatively easy though.
{quote}
EstimatedHistogram#testGroupBy is failing.
{quote}
Noted - will fix and resubmit
{quote}
comparator in Arrays#sort in EstimatedHistogram#logSummary has the same
conditions in both if and else if.
{quote}
Thanks, good spot. I'm surprised Eclipse didn't warn me.
> Instrument repair to be able to assess it's efficiency (precision)
> ------------------------------------------------------------------
>
> Key: CASSANDRA-2698
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Sylvain Lebresne
> Assignee: Benedict
> Priority: Minor
> Labels: lhf
> Attachments: nodetool_repair_and_cfhistogram.tar.gz,
> patch_2698_v1.txt, patch.diff, patch-rebased.diff
>
>
> Some reports indicate that repair sometime transfer huge amounts of data. One
> hypothesis is that the merkle tree precision may deteriorate too much at some
> data size. To check this hypothesis, it would be reasonably to gather
> statistic during the merkle tree building of how many rows each merkle tree
> range account for (and the size that this represent). It is probably an
> interesting statistic to have anyway.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira