[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229818#comment-14229818
 ] 

Benedict commented on CASSANDRA-7688:
-------------------------------------

This is a fundamentally difficult problem, and to be answered accurately 
basically requires a full compaction. We can track or estimate this data for 
any given sstable easily, and we can estimate the number of overlapping 
partitions between two sstables (though the accuracy I'm unsure of if we 
composed this data across many sstables), but we cannot say how many rows 
within each overlapping partition overlap. The best we could do is probably 
sample some overlapping partitions to see what proportion of row overlap tends 
to prevail, and hope it is representative; if we assume a normal distribution 
of overlap ratio we could return error bounds.

I don't think it's likely this data could be maintained live, at least not 
accurately, or not without significant cost. It would be an on-demand 
calculation that would be moderately expensive. 

> Add data sizing to a system table
> ---------------------------------
>
>                 Key: CASSANDRA-7688
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jeremiah Jordan
>             Fix For: 2.1.3
>
>
> Currently you can't implement something similar to describe_splits_ex purely 
> from the a native protocol driver.  
> https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
> getting ownership information to a client in the java-driver.  But you still 
> need the data sizing part to get splits of a given size.  We should add the 
> sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to