[jira] [Commented] (CASSANDRA-9540) Cql query doesn't return right information when using IN on columns for some keys

Mathijs Vogelzang (JIRA) Thu, 04 Jun 2015 05:14:33 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572659#comment-14572659
 ]


Mathijs Vogelzang commented on CASSANDRA-9540:
----------------------------------------------

I was able to make this case reproducible by getting the key from our 
production database and saving it with sstable2json.
It seems that the broken keys have one somewhat bigger cell (144 kB in this 
case).

A testcase json file is available at 
https://www.dropbox.com/s/kzu2jmqmwz788k8/testcase.json?dl=0 (the SSTable that 
it creates is available at 
https://www.dropbox.com/s/ilwpjka5r70n2os/test-testbug-ka-2-Data.db?dl=0 )

The commands that I executed are in cqlsh:
{noformat}
create keyspace test with 
replication={'class':'SimpleStrategy','replication_factor':1};
use test;
CREATE TABLE test.testbug (     key blob,     column1 blob,     value blob,     
PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE     AND CLUSTERING ORDER BY 
(column1 ASC)     AND bloom_filter_fp_chance = 0.01     AND caching = 
'{"keys":"ALL", "rows_per_partition":"NONE"}'     AND comment = ''     AND 
compaction = {'min_threshold': '4', 'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32'}     AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}     AND 
dclocal_read_repair_chance = 0.1     AND default_time_to_live = 0     AND 
gc_grace_seconds = 864000     AND max_index_interval = 2048     AND 
memtable_flush_period_in_ms = 0     AND min_index_interval = 128     AND 
read_repair_chance = 1.0     AND speculative_retry = 'NONE';
{noformat}
Then I injected the json as follows on the commandline:
{noformat}
json2sstable -K test -c testbug testcase.json 
DATADIR/test/testbug/test-testbug-ka-1-Data.db
nodetool refresh test testbug
{noformat}

The cqlsh then behaves as reported earlier, where querying for row "test" 
behaves as expected, but for row "broken" we don't get any data when we ask for 
columns that don't exist in that row (depending on the exact set of asked for 
columns, either 'metadata' is dropped, or no data is returned at all):
{noformat}
cqlsh:test> select blobAsText(key),blobAsText(column1) from testbug where 
key=textAsBlob('test') and column1 in 
(textAsBlob('metadata'),textAsBlob('bucket/0'));

 blobAsText(key) | blobAsText(column1)
-----------------+---------------------
            test |            bucket/0
            test |            metadata

(2 rows)
cqlsh:test> select blobAsText(key),blobAsText(column1) from testbug where 
key=textAsBlob('broken') and column1 in 
(textAsBlob('metadata'),textAsBlob('bucket/0'));

 blobAsText(key) | blobAsText(column1)
-----------------+---------------------
          broken |            bucket/0
          broken |            metadata

(2 rows)
cqlsh:test> select blobAsText(key),blobAsText(column1) from testbug where 
key=textAsBlob('test') and column1 in 
(textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdf'));

 blobAsText(key) | blobAsText(column1)
-----------------+---------------------
            test |            bucket/0
            test |            metadata

(2 rows)
cqlsh:test> select blobAsText(key),blobAsText(column1) from testbug where 
key=textAsBlob('broken') and column1 in 
(textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdf'));

 key | column1 | value
-----+---------+-------

(0 rows)
cqlsh:test> select blobAsText(key),blobAsText(column1) from testbug where 
key=textAsBlob('broken') and column1 in 
(textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'));

 blobAsText(key) | blobAsText(column1)
-----------------+---------------------
          broken |            bucket/0

(1 rows)
{noformat}

> Cql query doesn't return right information when using IN on columns for some 
> keys
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9540
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>         Environment: Cassandra 2.1.5
>            Reporter: Mathijs Vogelzang
>
> We are investigating a weird issue where one of our clients doesn't get data 
> on his dashboard. It seems Cassandra is not returning data for a particular 
> key ("brokenkey" from now on).
> Some background:
> We have a row where we store a "metadata" column and data in columns 
> "bucket/0", "bucket/1", "bucket/2", etc. Depending on the date selection of 
> the UI, we know that we only need to retrieve bucket/0, bucket/0 and bucket/1 
> etc. (we always need to retrieve "metadata").
> A typical query may look like this (using SELECT column1 to just show what is 
> returned, normally we would of course do SELECT value):
> {noformat}
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where 
> key=textAsBlob('install/workingkey');
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where 
> key=textAsBlob('install/brokenkey');
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> {noformat}
> These two queries work as expected, and return the information that we 
> actually stored.
> However, when we "filter" for certain columns, the brokenkey starts behaving 
> very weird:
> {noformat}
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where 
> key=textAsBlob('install/workingkey') and column1 IN 
> (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'));
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where 
> key=textAsBlob('install/workingkey') and column1 IN 
> (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf'));
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> ***  As expected, querying for more information doesn't really matter for the 
> working key ***
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where 
> key=textAsBlob('install/brokenkey') and column1 IN 
> (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'));
>  blobAsText(column1)
> ---------------------
>             bucket/0
> (1 rows)
> *** Cassandra stops giving us the metadata column when asking for a few more 
> columns! ***
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where 
> key=textAsBlob('install/brokenkey') and column1 IN 
> (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf'));
>  key | column1 | value
> -----+---------+-------
> (0 rows)
> *** Adding the bogus column name even makes it return nothing from this row 
> anymore! ***
> {noformat}
> There are at least two rows that malfunction like this in our table (which is 
> quite old already and has gone through a bunch of Cassandra upgrades). I've 
> upgraded our whole cluster to 2.1.5 (we were on 2.1.2 when I discovered this 
> problem) and compacted, repaired and scrubbed this column family, which 
> hasn't helped.
> Our table structure is:
> {noformat}
> cqlsh:AppBrain> describe table "GroupedSeries";
> CREATE TABLE "AppBrain"."GroupedSeries" (
>     key blob,
>     column1 blob,
>     value blob,
>     PRIMARY KEY (key, column1)
> ) WITH COMPACT STORAGE
>     AND CLUSTERING ORDER BY (column1 ASC)
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32'}
>     AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 1.0
>     AND speculative_retry = 'NONE';
> {noformat}
> Let me know if I can give more information that may be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9540) Cql query doesn't return right information when using IN on columns for some keys

Reply via email to