[ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799216#comment-13799216
 ] 

Constance Eustace commented on CASSANDRA-6137:
----------------------------------------------

I'm about to do a mandated OS upgrade. I will try HEAD when it comes back up

> CQL3 SELECT IN CLAUSE inconsistent
> ----------------------------------
>
>                 Key: CASSANDRA-6137
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6137
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu AWS Cassandra 2.0.1 SINGLE NODE on EBS RAID 
> storage
> OSX Cassandra 1.2.8 on SSD storage
>            Reporter: Constance Eustace
>            Priority: Minor
>
> I am elevating this to Critical after doing some trace and reproducing in 
> several environments. No one has commented on this bug from the cassandra 
> team, and I view unreliable/corrupted data a pretty big deal. We are 
> considering pulling cassandra and using something else.
> We have the data state reproduced locally in an environment that we can set 
> TRACE logging, attach a debugger, etc. Some guidance as to where to look 
> would be greatly appreciated.
> ------------------------------------------------------------------
> We are encountering inconsistent results from CQL3 queries with column keys 
> using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc 
> driver.
> Rowkey is e_entid
> Column key is p_prop
> This returns roughly 21 rows for 21 column keys that match p_prop.
> cqlsh> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';
> These three queries each return one row for the requested single column key 
> in the IN clause:
> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
> ('urn:bby:pcm:job:ingest:content:complete:count');
> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
> ('urn:bby:pcm:job:ingest:content:all:count');
> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
> ('urn:bby:pcm:job:ingest:content:fail:count');
> This query returns ONLY ONE ROW (one column key), not three as I would expect 
> from the three-column-key IN clause:
> cqlsh> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
> ('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
> This query does return two rows however for the requested two column keys:
> cqlsh> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (                  
>                               
> 'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
> cqlsh> describe table internal_submission.entity_job;
> CREATE TABLE entity_job (
>   e_entid text,
>   p_prop text,
>   describes text,
>   dndcondition text,
>   e_entlinks text,
>   e_entname text,
>   e_enttype text,
>   ingeststatus text,
>   ingeststatusdetail text,
>   p_flags text,
>   p_propid text,
>   p_proplinks text,
>   p_storage text,
>   p_subents text,
>   p_val text,
>   p_vallang text,
>   p_vallinks text,
>   p_valtype text,
>   p_valunit text,
>   p_vars text,
>   partnerid text,
>   referenceid text,
>   size int,
>   sourceip text,
>   submitdate bigint,
>   submitevent text,
>   userid text,
>   version text,
>   PRIMARY KEY (e_entid, p_prop)
> ) WITH
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.100000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='NONE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);
> CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job 
> (dndcondition);
> CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job 
> (ingeststatus);
> CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job 
> (ingeststatusdetail);
> CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job 
> (referenceid);
> CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);
> CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);
> -------------------------------
> My suspicion is that the three-column-key IN Clause is translated (improperly 
> or not) to a two-column key range with the assumption that the third column 
> key is present in that range, but it isn't...
> -----------------------------------
> We have tried: nodetool cache invalidations, start/stop of cassandra. Those 
> did NOT fix the problem. A table dump (COPY TO) and then reload (COPY FROM) 
> does fix the rows, but then more corruption creeps in.
> We are using the cassandra-jdbc driver, but I don't see anything wrong with 
> the issued statements inside the cassandra source code when I step through 
> the code. 
> With additional writes, it may be possible that some rows get fixed. 
> Compaction or other jobs may repair this, but on the timescale of hours done 
> debugging, the failures are consistent. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to