[
https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15445841#comment-15445841
]
Sylvain Lebresne commented on CASSANDRA-12367:
----------------------------------------------
I'm not entirely convinced by the way this is implemented because:
# it iterates over every row which sounds pretty wasteful, especially if the
goal is to have a cheap way to determine how big a partition is on disk (though
the description of the ticket could use a bit more in term of motivation, so
I'm mainly guessing that's the intended use case).
# it uses {{Row#dataSize()}} which only return the size of data contained in
the row, but ignoring all the artifact of the serialization. It also ignores
range tombstones. This overall mean the return number doesn't really represent
the size on disk, and what it represent is a big ad-hoc currently imo.
What I'd suggest is instead to use the index file, and return the actual size
of the data on disk (by simply subtracting the offset of the start and end of
the partition in the sstable). This would be *a lot* faster and imo more
meaningful (the only caveat being that it's still not the size on disk since it
ignores compression, but that's probably kind of ok).
Regarding exposing that in CQL however, I'm pretty much -1 on the syntax
suggested. I agree with Tyler, this is way too weird to make such a special
case in CQL. This is very different from the {{ttl()}} and {{writetime()}}
method for instance, in that those just return data that are part of CQL. This
metrics here imply a completely different path (since it's intrinsically a
local query) and result set, which means it'd be almost cleaner to have a full
different statement, like {{GET_PARTITION_SIZE FROM foo WHERE ...}} instead of
reusing {{SELECT}}. I'm *not* suggesting we add that too since imo it's way too
ad-hoc to justified the addition.
Don't get me wrong, I think this could be exposed much more elegantly once we
have virtual tables and I'll be happy to do so when we have them. And yes,
virtual tables will probably take a bit more time to come, but we'll have the
JMX call in the meantime.
> Add an API to request the size of a CQL partition
> -------------------------------------------------
>
> Key: CASSANDRA-12367
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12367
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Geoffrey Yu
> Assignee: Geoffrey Yu
> Priority: Minor
> Fix For: 3.x
>
> Attachments: 12367-trunk-v2.txt, 12367-trunk.txt
>
>
> It would be useful to have an API that we could use to get the total
> serialized size of a CQL partition, scoped by keyspace and table, on disk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)