[
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791607#comment-13791607
]
Constance Eustace commented on CASSANDRA-6137:
----------------------------------------------
It is now occurring in prod for other columns. There appears to be some hash
key impacts here...
[10/10/13 12:13:19 AM] Aaron Gaalswyk: wasn't a regression; DB corruption on a
product entity
[10/10/13 12:13:20 AM] Aaron Gaalswyk: cqlsh> SELECT
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,p_val,p_vallinks,p_vars
FROM internal_submission.Entity_Product WHERE e_entid =
'0d5acd67-3131-11e3-85d7-126aad0075d4-PROD' AND p_prop IN
('__CPSYS_type','__CPSYS_name','urn:bby:pcm:job:id');
(0 rows)
cqlsh> SELECT
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,p_val,p_vallinks,p_vars
FROM internal_submission.Entity_Product WHERE e_entid =
'0d5acd67-3131-11e3-85d7-126aad0075d4-PROD' AND p_prop IN
('__CPSYS_type','__CPSYS_name');
e_entid | e_entname
| e_enttype | p_prop
| p_flags | p_propid | p_val | p_vallinks | p_vars
-------------------------------------------+--------------------------------------------------------------------------------+---------------------+--------------+---------+----------+-------+------------+--------
0d5acd67-3131-11e3-85d7-126aad0075d4-PROD | 1 ft Cat5e Non Booted UTP
Unshielded Network Patch Cable :::: 757120254621|NEW | null |
__CPSYS_name | null | null | null | null | null
0d5acd67-3131-11e3-85d7-126aad0075d4-PROD |
null | urn:bby:pcm:product |
__CPSYS_type | null | null | null | null | null
(2 rows)
[10/10/13 12:20:12 AM] AGaal: cqlsh> SELECT
e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,p_val,p_vallinks,p_vars
FROM internal_submission.Entity_Product WHERE e_entid =
'0d5acd67-3131-11e3-85d7-126aad0075d4-PROD' AND p_prop IN
('urn:bby:pcm:job:id');
(0 rows)
[10/10/13 12:20:12 AM] AGaal: note that in this example 'urn:bby:cpm:job:id'
does not exist yet, so asking just for that correctly returns 0 rows:
[10/10/13 12:20:42 AM] AGaal: but if it's included in a where in() with 2 other
properties that do exist, then 0 rows are also returned there too, which is bad
[10/10/13 12:26:50 AM] AGaal: another work-around for where in() might be to do
a select for each desired property, so in this case there would have been 3
selects; could this be faster / more efficient than selecting all?
[10/10/13 12:37:51 AM] AGaal: we might be able to get some traction here by
enabling some cassandra logging and playing with the query
[10/10/13 12:38:29 AM] AGaal: like if the property name is shortened to
'urn:bby:pcm:', it returns the expected 2 rows
[10/10/13 12:39:08 AM] AGaal: but if it's 'urn:bby:pcm:j' or ''urn:bby:pcm:d '
it finds 0
[10/10/13 12:42:41 AM] AGaal: and if the last letter after urn:bby:cpm: is an
'a' or 'b' or 'c' it also returns 2…. and it's consistent with this. So it's
finding some sort of match in certain strings… like via a hash or startsWith or
something
> CQL3 SELECT IN CLAUSE inconsistent
> ----------------------------------
>
> Key: CASSANDRA-6137
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6137
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Ubuntu AWS Cassandra 2.0.1 SINGLE NODE
> Reporter: Constance Eustace
> Fix For: 2.0.1
>
>
> We are encountering inconsistent results from CQL3 queries with column keys
> using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc
> driver.
> Rowkey is e_entid
> Column key is p_prop
> This returns roughly 21 rows for 21 column keys that match p_prop.
> cqlsh> SELECT
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
> FROM internal_submission.Entity_Job WHERE e_entid =
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';
> These three queries each return one row for the requested single column key
> in the IN clause:
> SELECT
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
> FROM internal_submission.Entity_Job WHERE e_entid =
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB' AND p_prop in
> ('urn:bby:pcm:job:ingest:content:complete:count');
> SELECT
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
> FROM internal_submission.Entity_Job WHERE e_entid =
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB' AND p_prop in
> ('urn:bby:pcm:job:ingest:content:all:count');
> SELECT
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
> FROM internal_submission.Entity_Job WHERE e_entid =
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB' AND p_prop in
> ('urn:bby:pcm:job:ingest:content:fail:count');
> This query returns ONLY ONE ROW (one column key), not three as I would expect
> from the three-column-key IN clause:
> cqlsh> SELECT
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
> FROM internal_submission.Entity_Job WHERE e_entid =
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB' AND p_prop in
> ('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
> This query does return two rows however for the requested two column keys:
> cqlsh> SELECT
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
> FROM internal_submission.Entity_Job WHERE e_entid =
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB' AND p_prop in (
>
> 'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
> cqlsh> describe table internal_submission.entity_job;
> CREATE TABLE entity_job (
> e_entid text,
> p_prop text,
> describes text,
> dndcondition text,
> e_entlinks text,
> e_entname text,
> e_enttype text,
> ingeststatus text,
> ingeststatusdetail text,
> p_flags text,
> p_propid text,
> p_proplinks text,
> p_storage text,
> p_subents text,
> p_val text,
> p_vallang text,
> p_vallinks text,
> p_valtype text,
> p_valunit text,
> p_vars text,
> partnerid text,
> referenceid text,
> size int,
> sourceip text,
> submitdate bigint,
> submitevent text,
> userid text,
> version text,
> PRIMARY KEY (e_entid, p_prop)
> ) WITH
> bloom_filter_fp_chance=0.010000 AND
> caching='KEYS_ONLY' AND
> comment='' AND
> dclocal_read_repair_chance=0.000000 AND
> gc_grace_seconds=864000 AND
> index_interval=128 AND
> read_repair_chance=0.100000 AND
> replicate_on_write='true' AND
> populate_io_cache_on_flush='false' AND
> default_time_to_live=0 AND
> speculative_retry='NONE' AND
> memtable_flush_period_in_ms=0 AND
> compaction={'class': 'SizeTieredCompactionStrategy'} AND
> compression={'sstable_compression': 'LZ4Compressor'};
> CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);
> CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job
> (dndcondition);
> CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job
> (ingeststatus);
> CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job
> (ingeststatusdetail);
> CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job
> (referenceid);
> CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);
> CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);
> -------------------------------
> My suspicion is that the three-column-key IN Clause is translated (improperly
> or not) to a two-column key range with the assumption that the third column
> key is present in that range, but it isn't...
--
This message was sent by Atlassian JIRA
(v6.1#6144)