Benjamin Lerer created CASSANDRA-17183:
------------------------------------------
Summary: Using the user specified page size for internal paging in
GROUP BY queries can slow down the query and create high traffic between nodes
Key: CASSANDRA-17183
URL: https://issues.apache.org/jira/browse/CASSANDRA-17183
Project: Cassandra
Issue Type: Bug
Reporter: Benjamin Lerer
When performing aggregation queries or GROUP BY queries Cassandra compute the
aggregates on the coordinator node to ensure consistency and request the data
by pages (numbers of rows). Today, Cassandra use as internal page size the page
size requested by the user (the number of rows that should be returned to the
user). By consequence, if the page size requested by the user is too small the
number of request performed by the node will be much higher.
For 1,000,000 rows, a consistency level of LOCAL_QUORUM and a page size of
5,000 the coordinator will contact 200 times the replicas. For a page size of
100 (CQLSH page size) the coordinator will contact 10,000 times the replicas.
To avoid this problem we should have a minimum page size for the internal
paging and the possibility for the operators to change its value.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]