[ 
https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15445841#comment-15445841
 ] 

Sylvain Lebresne commented on CASSANDRA-12367:
----------------------------------------------

I'm not entirely convinced by the way this is implemented because:
# it iterates over every row which sounds pretty wasteful, especially if the 
goal is to have a cheap way to determine how big a partition is on disk (though 
the description of the ticket could use a bit more in term of motivation, so 
I'm mainly guessing that's the intended use case).
# it uses {{Row#dataSize()}} which only return the size of data contained in 
the row, but ignoring all the artifact of the serialization. It also ignores 
range tombstones. This overall mean the return number doesn't really represent 
the size on disk, and what it represent is a big ad-hoc currently imo.

What I'd suggest is instead to use the index file, and return the actual size 
of the data on disk (by simply subtracting the offset of the start and end of 
the partition in the sstable). This would be *a lot* faster and imo more 
meaningful (the only caveat being that it's still not the size on disk since it 
ignores compression, but that's probably kind of ok).

Regarding exposing that in CQL however, I'm pretty much -1 on the syntax 
suggested. I agree with Tyler, this is way too weird to make such a special 
case in CQL. This is very different from the {{ttl()}} and {{writetime()}} 
method for instance, in that those just return data that are part of CQL. This 
metrics here imply a completely different path (since it's intrinsically a 
local query) and result set, which means it'd be almost cleaner to have a full 
different statement, like {{GET_PARTITION_SIZE FROM foo WHERE ...}} instead of 
reusing {{SELECT}}. I'm *not* suggesting we add that too since imo it's way too 
ad-hoc to justified the addition.

Don't get me wrong, I think this could be exposed much more elegantly once we 
have virtual tables and I'll be happy to do so when we have them. And yes, 
virtual tables will probably take a bit more time to come, but we'll have the 
JMX call in the meantime.


> Add an API to request the size of a CQL partition
> -------------------------------------------------
>
>                 Key: CASSANDRA-12367
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12367
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Geoffrey Yu
>            Assignee: Geoffrey Yu
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 12367-trunk-v2.txt, 12367-trunk.txt
>
>
> It would be useful to have an API that we could use to get the total 
> serialized size of a CQL partition, scoped by keyspace and table, on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to