[ 
https://issues.apache.org/jira/browse/CASSANDRA-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001249#comment-14001249
 ] 

Bill Mitchell commented on CASSANDRA-7105:
------------------------------------------

Thank you for taking a look at this, Dave, as I was not yet ambitious enough to 
delve into this.  

I tried applying the attached 7105.txt patch to the current 2.0 branch (which 
is labelled 2.0.9).  The code in 2.0 is slightly different, but I'm assuming 
the fix is parallel:
{code}
    private List<ByteBuffer> buildBound(Bound bound,
                                        Collection<CFDefinition.Name> names,
                                        Restriction[] restrictions,
                                        boolean isReversed,
                                        ColumnNameBuilder builder,
                                        List<ByteBuffer> variables) throws 
InvalidRequestException
    {
    ...
                        s.add((b == Bound.END && copy.remainingCount() > 0) ? 
copy.buildAsEndOfRange() : copy.build());
                    }
                    this.isReversed ^= isReversedType(name);
                    return new ArrayList<ByteBuffer>(s);
                }
{code}
It appears to me that we have only a partial fix to this problem.

Going back to the initial problem description where there were two IN operators 
in the predicate, the SELECT specifying the two partition id values and two 
emailcrypt values should return 2 rows.  The initial fault was that no rows 
were returned.  With the patch, one row was returned, not two, returning the 
value from partition 1.  

Trying to explore what might be happening, I added a third row:  
insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, createDate) 
values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 1, 'noname2', 
'98bf28af2ca9c498d6e47237bb8680c0', '2014-04-28T14:05:59.236-0500');

When I did a select requesting the three email values from the two partitions, 
it now returned only the first two and not the third.  If I specified only a 
single value, partition IN (1), it returned only 1 row, not the now expected 
two.  

I then added another row into a new partition 3.  This time, perhaps it was an 
error on my part, I included both the insert and the subsequent select in a 
single request in DataStax DevCenter:
insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, createDate) 
values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 3, 'noname3', 
'99bf28af2ca9c498d6e47237bb8680c1', '2014-04-28T14:05:59.236-0500');
select emailCrypt, emailAddr from sr2 where siteID = 
'4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
('5fe7719229092cdde4526afbc65c900c','99bf28af2ca9c498d6e47237bb8680c1','97bf28af2ca9c498d6e47237bb8680bf','98bf28af2ca9c498d6e47237bb8680c0');

This timed out, with an exception in Cassandra output window:
14/05/18 17:36:33 ERROR service.CassandraDaemon: Exception in thread 
Thread[ReadStage:137,5,main]
java.lang.AssertionError: Added column does not sort as the last column
        at 
org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:115)
        at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116)
        at 
org.apache.cassandra.db.ColumnFamily.addIfRelevant(ColumnFamily.java:110)
        at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:205)
        at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
        at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
        at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
        at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
        at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
        at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1541)
        at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1370)
        at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327)
        at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
        at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1348)
        at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1912)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

If it helps in locating the problem, in my full application test, I am also 
seeing a problem where a row is being returned that does not match any of the 
values in the IN restriction.  I've not yet reduced this to a simple test case. 
 



> SELECT with IN on final column of composite and compound primary key fails
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7105
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7105
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: DataStax Cassandra 2.0.7
> Windows dual-core laptop
>            Reporter: Bill Mitchell
>         Attachments: 7105.txt
>
>
> I have a failing sequence where I specify an IN constraint on the final int 
> column of the composite primary key and an IN constraint on the final String 
> column of the compound primary key and no rows are returned, when rows should 
> be returned.  
> {noformat}
> CREATE TABLE IF NOT EXISTS sr2 (siteID TEXT, partition INT, listID BIGINT, 
> emailAddr TEXT, emailCrypt TEXT, createDate TIMESTAMP, removeDate TIMESTAMP, 
> removeImportID BIGINT, properties TEXT, PRIMARY KEY ((siteID, listID, 
> partition), createDate, emailCrypt) ) WITH CLUSTERING ORDER BY (createDate 
> DESC, emailCrypt DESC)  AND compression = {'sstable_compression' : 
> 'SnappyCompressor'} AND compaction = {'class' : 
> 'SizeTieredCompactionStrategy'};
> insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, 
> createDate) values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 1, 'xyzzy', 
> '5fe7719229092cdde4526afbc65c900c', '2014-04-28T14:05:59.236-0500');
> insert into sr2 (siteID, listID, partition, emailAddr, emailCrypt, 
> createDate) values ('4ca4f79e-3ab2-41c5-ae42-c7009736f1d5', 34, 2, 'noname', 
> '97bf28af2ca9c498d6e47237bb8680bf', '2014-04-28T14:05:59.236-0500');
> select emailCrypt, emailAddr from sr2 where siteID = 
> '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
> createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
> '97bf28af2ca9c498d6e47237bb8680bf';
>  emailcrypt                       | emailaddr
> ----------------------------------+-----------
>  97bf28af2ca9c498d6e47237bb8680bf |    noname
> (1 rows)
> select emailCrypt, emailAddr  from sr2 where siteID = 
> '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
> createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt = 
> '5fe7719229092cdde4526afbc65c900c';
>  emailcrypt                       | emailaddr
> ----------------------------------+-----------
>  5fe7719229092cdde4526afbc65c900c |     xyzzy
> (1 rows)
> select emailCrypt, emailAddr from sr2 where siteID = 
> '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
> and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
> ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
> (0 rows)
> cqlsh:test_multiple_in> select * from sr2;
>  siteid                               | listid | partition | createdate       
>                         | emailcrypt | emailaddr                        | 
> properties | removedate | re
> moveimportid
> --------------------------------------+--------+-----------+------------------------------------------+------------+----------------------------------+------------+------------+---
> -------------
>  4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 |     34 |         2 | 2014-04-28 
> 14:05:59Central Daylight Time |     noname | 97bf28af2ca9c498d6e47237bb8680bf 
> |       null |       null |
>         null
>  4ca4f79e-3ab2-41c5-ae42-c7009736f1d5 |     34 |         1 | 2014-04-28 
> 14:05:59Central Daylight Time |      xyzzy | 5fe7719229092cdde4526afbc65c900c 
> |       null |       null |
>         null
> (2 rows)
> select emailCrypt, emailAddr from sr2 where siteID = 
> '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
> and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
> ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
> (0 rows)
> select emailCrypt, emailAddr from sr2 where siteID = 
> '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 1 and 
> createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
> ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
> (0 rows)
> select emailCrypt, emailAddr from sr2 where siteID = 
> '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition = 2 and 
> createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
> ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
> (0 rows)
> select emailCrypt, emailAddr from sr2 where siteID = 
> '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
> and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
> ('97bf28af2ca9c498d6e47237bb8680bf','5fe7719229092cdde4526afbc65c900c');
> (0 rows)
> cqlsh:test_multiple_in> select emailCrypt, emailAddr from sr2 where siteID = 
> '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
> and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
> ('97bf28af2ca9c498d6e47237bb8680bf');
>  emailcrypt                       | emailaddr
> ----------------------------------+-----------
>  97bf28af2ca9c498d6e47237bb8680bf |    noname
> (1 rows)
> cqlsh:test_multiple_in> select emailCrypt, emailAddr from sr2 where siteID = 
> '4ca4f79e-3ab2-41c5-ae42-c7009736f1d5' and listID = 34 and partition IN (1,2) 
> and createDate = '2014-04-28T14:05:59.236-0500' and emailCrypt IN 
> ('5fe7719229092cdde4526afbc65c900c');
>  emailcrypt                       | emailaddr
> ----------------------------------+-----------
>  5fe7719229092cdde4526afbc65c900c |     xyzzy
> (1 rows)
> {noformat}
> As you can see, when I specify IN on the final primary column, no rows are 
> returned, even when I specify equality on the partition column.  If I use IN 
> to constrain the partition column but simple equality on the final column, 
> one row is returned for each of the possible values.  
> This appears to be a variation on Cassandra-6327 but with a String as the 
> final primary key column.  I initially saw this with a blob as the final 
> primary key column, so the issue is not exclusive to String.  When I tried a 
> real simple case with ints throughout, that worked fine.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to