[
https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866245#comment-13866245
]
Nikolai Grigoriev edited comment on CASSANDRA-6407 at 1/9/14 3:21 AM:
----------------------------------------------------------------------
[~xedin] I have prepared a simple test that does demonstrate the problem even
in a small single-node cluster. Interestingly enough, with this test and such a
small cluster with no load at all sometimes it actually works.
So, here is how I use it:
1. Set the RPC server type to hsha
2. Load the attached CQL ile
3. Use CQLSH
use cassandra6407test ;
select * from my_test_table ;
In most of the cases this SELECT gets stuck forever. Sometimes if you interrupt
it (after a while) and do it again it actually returns all the data on the
second attempt. Sometimes it does not. If you restart CQLSH and do it again -
it will get stuck again. Specifying a LIMIT above 24-25 demonstrates similar
behavior.
If you switch RPC server type to "sync" and restart, then "select * from
my_test_table ;" works all the time.
It almost feels like some sort of race condition or a timing issue somewhere
between the part that produces the query result and the part that streams it
back to the client.
The server config I have attached is simplified, I have disabled JNA, JEMalloc
etc to have a configuration that is as close as possible to the default
installation.
was (Author: ngrigoriev):
[~xedin] I have prepared a simple test that does demonstrate the problem even
in a small single-node cluster. Interestingly enough, with this test and such a
small cluster with no load at all sometimes it actually works.
So, here is how I use it:
1. Set the RPC server type to hsha
2. Load the attached CQL ile
3. Use CQLSH
use cassandra6407test ;
select * from my_test_table ;
In most of the cases this SELECT gets stuck forever. Sometimes if you interrupt
it (after a while) and do it again it actually returns all the data on the
second attempt. Sometimes it does not. If you restart CQLSH and do it again -
it will get stuck again. Specifying a LIMIT above 24-25 demonstrates similar
behavior.
If you switch RPC server type to "sync" and restart, then "select * from
my_test_table ;" works all the time.
It almost feels like some sort of race condition or a timing issue somewhere
between the part that produces the query result and the part that streams it
back to the client.
> CQL/Thrift request hangs forever when querying more than certain amount of
> data
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-6407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6407
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2
> Reporter: Nikolai Grigoriev
> Attachments: cassandra.jstack.gz, cassandra.yaml,
> cassandra6407test.cql.gz, system.log.gz
>
>
> I have a table like this (slightly simplified for clarity):
> {code}
> CREATE TABLE my_test_table (
> uid uuid,
> d_id uuid,
> a_id uuid,
> c_id text,
> i_id blob,
> data text,
> PRIMARY KEY ((uid, d_id, a_id), c_id, i_id)
> );
> {code}
> I have created about over a hundred (117 to be specific) of sample entities
> with the same row key and different clustering keys. Each has a blob of
> approximately 4Kb.
> I have tried to fetch all of them with a query like this via CQLSH:
> {code}
> select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572
> and a_id=00000000-0000-4000-0000-000000000002 and
> d_id=00000000-0000-1e64-0000-000000000001 and c_id='list-2'
> {code}
> This query simply hangs in CQLSH, it does not return at all until I abort it.
> Then I started playing with LIMIT clause and found that this query returns
> instantly (with good data) when I use LIMIT 55 but hangs forever when I use
> LIMIT 56.
> Then I tried to just query all "i_id" values like this:
> {code}
> select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572
> and a_id=00000000-0000-4000-0000-000000000002 and
> d_id=00000000-0000-1e64-0000-000000000001 and c_id='list-2'
> {code}
> And this query returns instantly with the complete set of 117 values. So I
> started thinking that it must be something about the total size of the
> response, not the number of results or the number of columns to be fetches in
> slices. And I have tried another test:
> {code}
> select cdata from my_test_table where
> uid=44338526-7aac-4640-bcde-0f4663c07572 and
> a_id=00000000-0000-4000-0000-000000000002 and
> d_id=00000000-0000-1e64-0000-000000000001 and c_id='list-2' LIMIT 63
> {code}
> This query returns instantly but if I change the limit to 64 it hangs
> forever. Since my blob is about 4Kb for each entity it *seems* like the query
> hangs when the total size of the response exceeds 252..256Kb. Looks quite
> suspicious especially because 256Kb is such a particular number. I am
> wondering if this has something to do with the result paging.
> I did not test if the issue is reproducible outside of CQLSH but I do recall
> that I observed somewhat similar behavior when fetching relatively large data
> sets.
> I can consistently reproduce this problem on my cluster. I am also attaching
> the jstack output that I have captured when CQLSH was hanging on one of these
> queries.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)