[ https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229818#comment-14229818 ]
Benedict commented on CASSANDRA-7688: ------------------------------------- This is a fundamentally difficult problem, and to be answered accurately basically requires a full compaction. We can track or estimate this data for any given sstable easily, and we can estimate the number of overlapping partitions between two sstables (though the accuracy I'm unsure of if we composed this data across many sstables), but we cannot say how many rows within each overlapping partition overlap. The best we could do is probably sample some overlapping partitions to see what proportion of row overlap tends to prevail, and hope it is representative; if we assume a normal distribution of overlap ratio we could return error bounds. I don't think it's likely this data could be maintained live, at least not accurately, or not without significant cost. It would be an on-demand calculation that would be moderately expensive. > Add data sizing to a system table > --------------------------------- > > Key: CASSANDRA-7688 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7688 > Project: Cassandra > Issue Type: New Feature > Reporter: Jeremiah Jordan > Fix For: 2.1.3 > > > Currently you can't implement something similar to describe_splits_ex purely > from the a native protocol driver. > https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily > getting ownership information to a client in the java-driver. But you still > need the data sizing part to get splits of a given size. We should add the > sizing information to a system table so that native clients can get to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)