[
https://issues.apache.org/jira/browse/CASSANDRA-21165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Harsh Desai updated CASSANDRA-21165:
------------------------------------
Description:
During load testing of Cassandra 5.0.6 cluster, we came across an unusual issue
wherein a lightweight CQL query times out.
Upon further analysis, it was found that the query being executed on the server
side does not seem to be the same as the one sent by driver.
{+}Client side code{+}:
this.statement = session.prepare(SimpleStatement.newInstance("SELECT column1
from \"kspace\".\"tsTable\" WHERE key = ? AND key2 = ? ORDER BY column1 DESC
LIMIT 1").setIdempotent(true));
{+}Cassandra server audit logs{+}:
FileAuditLogger.java:51 -
...|type:REQUEST_FAILURE|category:ERROR|ks:kspace|scope:tsTable|operation:SELECT
column1 from "kspace"."tsTable" WHERE key = ? AND key2 = ? ORDER BY column1
DESC LIMIT 1; Operation timed out - received only 1 responses.
{+}Cassandra server logs{+}:
NoSpamLogger.java:104 - ...ReadTimeoutException "Operation timed out - received
only 1 responses." while executing SELECT {color:#ff0000}***{color} FROM
"kspace"."tsTable" WHERE key = c001c5c2-f0a7-1046-115d-edb4b67ab0d9 AND key2 =
'2026-02' ORDER BY column1 DESC, {color:#ff0000}*column2 ASC, column3 DESC,
column4 DESC*{color} LIMIT 1 {color:#ff0000}*ALLOW FILTERING*{color}
{+}Replica node logs{+}:
.. [WARN ] [ReadStage-68] cluster_id=1 ip_address=1.1.1.1
NoSpamLogger.java:107 - /2.2.2.2:7000->/3.3.3.3:7000-LARGE_MESSAGES-2acb4e9d
overloaded; dropping 1.779MiB message (queue: 131.653MiB local, 127.653MiB
endpoint, 127.653MiB global)
{+}Table Schema{+}:
||Column||Type||Key type||
|key|TIMEUUID|Partition Key|
|key2|TEXT|Partition Key|
|column1|BIGINT|Clustering Column ASC|
|column2|TIMEUUID|Clustering Column DESC|
|column3|BOOLEAN|Clustering Column ASC|
|column4|TEXT|Clustering Column ASC|
|value|BLOB| |
Attached is the CQL query TRACING output (executed separately) which shows that
a message being transmitted from the replica node is the large one.
Evidently, the query sent by the driver is quite light-weight while the one
executed on the server is not, as it tries to fetch all the columns including
the blob which is not asked for. This might be supported by the fact that the
message happens to be a large one and hence dropped. Besides, the query runs
with “ALLOW FILTERING” unexpectedly which is detrimental to the query
performance.
was:
During load testing of Cassandra 5.0.6 cluster, we came across an unusual issue
wherein a lightweight CQL query times out.
Upon further analysis, it was found that the query being executed on the server
side does not seem to be the same as the one sent by driver.
{+}Client side code{+}:
this.statement = session.prepare(SimpleStatement.newInstance("SELECT column1
from \"kspace\".\"tsTable\" WHERE key = ? AND key2 = ? ORDER BY column1 DESC
LIMIT 1").setIdempotent(true));
{+}Cassandra server audit logs{+}:
FileAuditLogger.java:51 -
...|type:REQUEST_FAILURE|category:ERROR|ks:kspace|scope:tsTable|operation:SELECT
column1 from "kspace"."tsTable" WHERE key = ? AND key2 = ? ORDER BY column1
DESC LIMIT 1; Operation timed out - received only 1 responses.
{+}Cassandra server logs{+}:
NoSpamLogger.java:104 - ...ReadTimeoutException "Operation timed out - received
only 1 responses." while executing SELECT {color:#FF0000}***{color} FROM
"kspace"."tsTable" WHERE key = c001c5c2-f0a7-1046-115d-edb4b67ab0d9 AND key2 =
'2026-02' ORDER BY column1 DESC, {color:#FF0000}*column2 ASC, column3 DESC,
column4 DESC*{color} LIMIT 1 {color:#FF0000}*ALLOW FILTERING*{color}
{+}Table Schema{+}:
||Column||Type||Key type||
|key|TIMEUUID|Partition Key|
|key2|TEXT|Partition Key|
|column1|BIGINT|Clustering Column ASC|
|column2|TIMEUUID|Clustering Column DESC|
|column3|BOOLEAN|Clustering Column ASC|
|column4|TEXT|Clustering Column ASC|
|value|BLOB| |
Attached is the CQL query TRACING output (executed separately) which shows that
a message being transmitted from the replica node is the large one.
Evidently, the query sent by the driver is quite light-weight while the one
executed on the server is not, as it tries to fetch all the columns including
the blob which is not asked for. This might be supported by the fact that the
message happens to be a large one and hence dropped. Besides, the query runs
with “ALLOW FILTERING” unexpectedly which is detrimental to the query
performance.
> Query read timeout potentially due to altered query on server side
> ------------------------------------------------------------------
>
> Key: CASSANDRA-21165
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21165
> Project: Apache Cassandra
> Issue Type: Bug
> Reporter: Harsh Desai
> Priority: Urgent
> Attachments: CQL_TRACING_Output.txt
>
>
> During load testing of Cassandra 5.0.6 cluster, we came across an unusual
> issue wherein a lightweight CQL query times out.
> Upon further analysis, it was found that the query being executed on the
> server side does not seem to be the same as the one sent by driver.
>
> {+}Client side code{+}:
> this.statement = session.prepare(SimpleStatement.newInstance("SELECT column1
> from \"kspace\".\"tsTable\" WHERE key = ? AND key2 = ? ORDER BY column1 DESC
> LIMIT 1").setIdempotent(true));
>
> {+}Cassandra server audit logs{+}:
> FileAuditLogger.java:51 -
> ...|type:REQUEST_FAILURE|category:ERROR|ks:kspace|scope:tsTable|operation:SELECT
> column1 from "kspace"."tsTable" WHERE key = ? AND key2 = ? ORDER BY column1
> DESC LIMIT 1; Operation timed out - received only 1 responses.
>
> {+}Cassandra server logs{+}:
> NoSpamLogger.java:104 - ...ReadTimeoutException "Operation timed out -
> received only 1 responses." while executing SELECT {color:#ff0000}***{color}
> FROM "kspace"."tsTable" WHERE key = c001c5c2-f0a7-1046-115d-edb4b67ab0d9 AND
> key2 = '2026-02' ORDER BY column1 DESC, {color:#ff0000}*column2 ASC, column3
> DESC, column4 DESC*{color} LIMIT 1 {color:#ff0000}*ALLOW FILTERING*{color}
>
> {+}Replica node logs{+}:
> .. [WARN ] [ReadStage-68] cluster_id=1 ip_address=1.1.1.1
> NoSpamLogger.java:107 - /2.2.2.2:7000->/3.3.3.3:7000-LARGE_MESSAGES-2acb4e9d
> overloaded; dropping 1.779MiB message (queue: 131.653MiB local, 127.653MiB
> endpoint, 127.653MiB global)
> {+}Table Schema{+}:
>
> ||Column||Type||Key type||
> |key|TIMEUUID|Partition Key|
> |key2|TEXT|Partition Key|
> |column1|BIGINT|Clustering Column ASC|
> |column2|TIMEUUID|Clustering Column DESC|
> |column3|BOOLEAN|Clustering Column ASC|
> |column4|TEXT|Clustering Column ASC|
> |value|BLOB| |
>
> Attached is the CQL query TRACING output (executed separately) which shows
> that a message being transmitted from the replica node is the large one.
> Evidently, the query sent by the driver is quite light-weight while the one
> executed on the server is not, as it tries to fetch all the columns including
> the blob which is not asked for. This might be supported by the fact that the
> message happens to be a large one and hence dropped. Besides, the query runs
> with “ALLOW FILTERING” unexpectedly which is detrimental to the query
> performance.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]