[jira] [Commented] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent

Constance Eustace (JIRA) Tue, 15 Oct 2013 14:34:57 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795676#comment-13795676
 ]


Constance Eustace commented on CASSANDRA-6137:
----------------------------------------------

Some debugging of 

SELECT * FROM wayfair_submission.entity_job WHERE e_entid = 
'924d6742-31fd-11e3-97f7-001c42000009-CJOB' AND p_prop IN 
('__CPSYS_name','urn:bby:pcm:ingest:status','subPropA:filttest:sdf','urn@bby@pcm@job@ingest@content@complete@count')

SelectRawStatement[name=wayfair_submission.entity_job, selectClause=[], 
whereClause=[e_entid EQ '924d6742-31fd-11e3-97f7-001c42000009-CJOB', p_prop IN 
['__CPSYS_name', 'urn:bby:pcm:ingest:status', 'subPropA:filttest:sdf', 
'urn@bby@pcm@job@ingest@content@complete@count']], isCount=false,
--> is the CF metadata properly returned for all the columns in the parsed 
statement?
--> is a Range/Slice of Columns (SelectStatement:217)
  --> Range must have a high and a low, are the right ones being selected
  [SliceFromReadCommand(table='wayfair_submission', 
key='39323464363734322d333166642d313165332d393766372d3030316334323030303030392d434a4f42',
 column_parent='QueryPath(columnFamilyName='entity_job', 
superColumnName='null', columnName='null')', filter='SliceQueryFilter 
[reversed=false, slices=[[000c5f5f43505359535f6e616d6500, 
000c5f5f43505359535f6e616d6501], 
[001573756250726f70413a66696c74746573743a73646600, 
001573756250726f70413a66696c74746573743a73646601], 
[001975726e3a6262793a70636d3a696e676573743a73746174757300, 
001975726e3a6262793a70636d3a696e676573743a73746174757301], 
[002d75726e406262794070636d406a6f6240696e6765737440636f6e74656e7440636f6d706c65746540636f756e7400,
 
002d75726e406262794070636d406a6f6240696e6765737440636f6e74656e7440636f6d706c65746540636f756e7401]],
 count=10000, toGroup = 1]')]
   SliceFromReadCommand
      (table='wayfair_submission', 
       
key='39323464363734322d333166642d313165332d393766372d3030316334323030303030392d434a4f42',
 
       column_parent='QueryPath(columnFamilyName='entity_job', 
superColumnName='null', columnName='null')', 
       filter='
         SliceQueryFilter [
           reversed=false, 
           slices=[
             [000c5f5f43505359535f6e616d6500, 000c5f5f43505359535f6e616d6501], 
             [001573756250726f70413a66696c74746573743a73646600, 
001573756250726f70413a66696c74746573743a73646601], 
             [001975726e3a6262793a70636d3a696e676573743a73746174757300, 
001975726e3a6262793a70636d3a696e676573743a73746174757301], 
             
[002d75726e406262794070636d406a6f6240696e6765737440636f6e74656e7440636f6d706c65746540636f756e7400,
 
002d75726e406262794070636d406a6f6240696e6765737440636f6e74656e7440636f6d706c65746540636f756e7401]
           ], 
           count=10000, 
           toGroup = 1]'
      )

Guessing from the length of the four number pairs there, you can see that the 
four numbers are probably the four column names in terms of length....

Row(
  key=
    DecoratedKey(8705314879532960628, 
39323464363734322d333166642d313165332d393766372d3030316334323030303030392d434a4f42),
 
  cf=ColumnFamily(
    entity_job [
      __CPSYS_name::false:0@1381445090517000,
      __CPSYS_name:e_entname:false:10@1381445090517000,
      subPropA\:filttest\:sdf::false:0@1381445090517000,
      subPropA\:filttest\:sdf:p_flags:false:3@1381445090517000,
      subPropA\:filttest\:sdf:p_propid:false:36@1381445090517000,
      subPropA\:filttest\:sdf:p_proplinks:false:2@1381445090517000,
      subPropA\:filttest\:sdf:p_subents:false:2@1381445090517000,
      subPropA\:filttest\:sdf:p_val:false:22@1381445090517000,
      subPropA\:filttest\:sdf:p_vallang:false:5@1381445090517000,
      subPropA\:filttest\:sdf:p_vallinks:false:2@1381445090517000,
      subPropA\:filttest\:sdf:p_valtype:false:4@1381445090517000,
      subPropA\:filttest\:sdf:p_valunit:false:1@1381445090517000,
      subPropA\:filttest\:sdf:p_vars:false:84@1381445090517000,
      urn\:bby\:pcm\:ingest\:status:ingeststatus:false:4@1381445090517000,
      urn\:bby\:pcm\:ingest\:status:p_flags:false:3@1381445090517000,
      urn\:bby\:pcm\:ingest\:status:p_propid:false:36@1381445090517000,
      urn\:bby\:pcm\:ingest\:status:p_proplinks:false:2@1381445090517000,
      urn\:bby\:pcm\:ingest\:status:p_subents:false:2@1381445090517000,
      urn\:bby\:pcm\:ingest\:status:p_vallinks:false:2@1381445090517000,
      urn\:bby\:pcm\:ingest\:status:p_vars:false:70@1381445090517000,
    ]
  )
)

... does that mean the CF metadata doesn't have the columns for the last one? 
Or is that just the data value?


> CQL3 SELECT IN CLAUSE inconsistent
> ----------------------------------
>
>                 Key: CASSANDRA-6137
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6137
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu AWS Cassandra 2.0.1 SINGLE NODE
>            Reporter: Constance Eustace
>             Fix For: 2.0.1
>
>
> We are encountering inconsistent results from CQL3 queries with column keys 
> using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc 
> driver.
> Rowkey is e_entid
> Column key is p_prop
> This returns roughly 21 rows for 21 column keys that match p_prop.
> cqlsh> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';
> These three queries each return one row for the requested single column key 
> in the IN clause:
> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
> ('urn:bby:pcm:job:ingest:content:complete:count');
> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
> ('urn:bby:pcm:job:ingest:content:all:count');
> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
> ('urn:bby:pcm:job:ingest:content:fail:count');
> This query returns ONLY ONE ROW (one column key), not three as I would expect 
> from the three-column-key IN clause:
> cqlsh> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in 
> ('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
> This query does return two rows however for the requested two column keys:
> cqlsh> SELECT 
> e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
>  FROM internal_submission.Entity_Job WHERE e_entid = 
> '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_prop in (                  
>                               
> 'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
> cqlsh> describe table internal_submission.entity_job;
> CREATE TABLE entity_job (
>   e_entid text,
>   p_prop text,
>   describes text,
>   dndcondition text,
>   e_entlinks text,
>   e_entname text,
>   e_enttype text,
>   ingeststatus text,
>   ingeststatusdetail text,
>   p_flags text,
>   p_propid text,
>   p_proplinks text,
>   p_storage text,
>   p_subents text,
>   p_val text,
>   p_vallang text,
>   p_vallinks text,
>   p_valtype text,
>   p_valunit text,
>   p_vars text,
>   partnerid text,
>   referenceid text,
>   size int,
>   sourceip text,
>   submitdate bigint,
>   submitevent text,
>   userid text,
>   version text,
>   PRIMARY KEY (e_entid, p_prop)
> ) WITH
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.100000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='NONE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);
> CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job 
> (dndcondition);
> CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job 
> (ingeststatus);
> CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job 
> (ingeststatusdetail);
> CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job 
> (referenceid);
> CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);
> CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);
> -------------------------------
> My suspicion is that the three-column-key IN Clause is translated (improperly 
> or not) to a two-column key range with the assumption that the third column 
> key is present in that range, but it isn't...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent

Reply via email to