[ https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626520#comment-13626520 ]
Benedict commented on CASSANDRA-2698: ------------------------------------- Hi Yuki, Without in some way collecting (or at least sampling) the size of the differences, I don't know what bucket sizes to use. Since I need to reinsert all the records once I've decided this anyway, I need to retain them all, which I chose to do in EstimatedHistogram as they do, in effect, constitute a histogram. I also sample the largest records which I figure could be useful for debugging purposes (though that was just a guess). I don't see why 1000s of items is a major issue? I agree that logging is suboptimal for this data. Presumably similar data for other tasks may be optionally logged in future, and so I would guess this should form part of a discussion about metric logging? {quote} fix coding style (especially whitespace) to match other code. {quote} Do you have an Eclipse formatter profile I could use for your coding convention? I did my best to keep it correct manually, but it is difficult to spot differences in an unfamiliar convention. Whitespace should be comparatively easy though. {quote} EstimatedHistogram#testGroupBy is failing. {quote} Noted - will fix and resubmit {quote} comparator in Arrays#sort in EstimatedHistogram#logSummary has the same conditions in both if and else if. {quote} Thanks, good spot. I'm surprised Eclipse didn't warn me. > Instrument repair to be able to assess it's efficiency (precision) > ------------------------------------------------------------------ > > Key: CASSANDRA-2698 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2698 > Project: Cassandra > Issue Type: Improvement > Reporter: Sylvain Lebresne > Assignee: Benedict > Priority: Minor > Labels: lhf > Attachments: nodetool_repair_and_cfhistogram.tar.gz, > patch_2698_v1.txt, patch.diff, patch-rebased.diff > > > Some reports indicate that repair sometime transfer huge amounts of data. One > hypothesis is that the merkle tree precision may deteriorate too much at some > data size. To check this hypothesis, it would be reasonably to gather > statistic during the merkle tree building of how many rows each merkle tree > range account for (and the size that this represent). It is probably an > interesting statistic to have anyway. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira