[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626520#comment-13626520
 ] 

Benedict commented on CASSANDRA-2698:
-------------------------------------

Hi Yuki,

Without in some way collecting (or at least sampling) the size of the 
differences, I don't know what bucket sizes to use. Since I need to reinsert 
all the records once I've decided this anyway, I need to retain them all, which 
I chose to do in EstimatedHistogram as they do, in effect, constitute a 
histogram. I also sample the largest records which I figure could be useful for 
debugging purposes (though that was just a guess). I don't see why 1000s of 
items is a major issue?

I agree that logging is suboptimal for this data. Presumably similar data for 
other tasks may be optionally logged in future, and so I would guess this 
should form part of a discussion about metric logging?

{quote}
fix coding style (especially whitespace) to match other code.
{quote}
Do you have an Eclipse formatter profile I could use for your coding 
convention? I did my best to keep it correct manually, but it is difficult to 
spot differences in an unfamiliar convention. Whitespace should be 
comparatively easy though.

{quote}
EstimatedHistogram#testGroupBy is failing.
{quote}
Noted - will fix and resubmit

{quote}
comparator in Arrays#sort in EstimatedHistogram#logSummary has the same 
conditions in both if and else if.
{quote}
Thanks, good spot. I'm surprised Eclipse didn't warn me.



                
> Instrument repair to be able to assess it's efficiency (precision)
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2698
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Benedict
>            Priority: Minor
>              Labels: lhf
>         Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
> patch_2698_v1.txt, patch.diff, patch-rebased.diff
>
>
> Some reports indicate that repair sometime transfer huge amounts of data. One 
> hypothesis is that the merkle tree precision may deteriorate too much at some 
> data size. To check this hypothesis, it would be reasonably to gather 
> statistic during the merkle tree building of how many rows each merkle tree 
> range account for (and the size that this represent). It is probably an 
> interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to