[ 
https://issues.apache.org/jira/browse/CASSANDRA-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016849#comment-13016849
 ] 

Shotaro Kamio commented on CASSANDRA-2406:
------------------------------------------

I've attached an experimental patch. The problem is gone with this patch. But 
it's inefficient when a large number of rows are requested.

The main problem was that the rows collected in ColumnFamilyStore.scan() can 
have duplicates. So, it returns less unique rows than requested. Then, 
StorageProxy.scan() asks more results from next range. That means the last 
returned row gets wrong.

As for inefficiency of the patch, if the rows are added in order, the 
uniqueness check should be done only for the last row. But I don't known if I 
can assume the order or not. So, please improve the patch if so.


> Secondary index and index expression problems
> ---------------------------------------------
>
>                 Key: CASSANDRA-2406
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2406
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>         Environment: CentOS 5.5 (64bit), JDK 1.6.0_23
>            Reporter: Muga Nishizawa
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.5
>
>         Attachments: CASSANDRA-2406-debug.patch, create_table.cli, 
> secondary_index_checkv2.py, secondary_index_insertv2.py
>
>
> When I iteratively get data with secondary index and index clause, result of 
> data acquired by consistency level "one" is different from the one by 
> consistency level "quorum".  The one by consistecy level "one" is correct 
> result.  But the one by consistecy level "quorum" is incorrect and is dropped 
> by Cassandra.  
> You can reproduce the bug by executing attached programs.
> - 1. Start Cassandra cluster.  It consists of 3 cassandra nodes and 
> distributes data by ByteOrderedPartitioner.  Initial tokens of those nodes 
> are ["31", "32", "33"].  
> - 2. Create keyspace and column family, according to "create_table.cli",
> - 3. Execute "secondary_index_insertv2.py", inserting a few hundred columns 
> to cluster
> - 4. Execute "secondary_index_checkv2.py" and get data with secondary index 
> and index clause iteratively.  "secondary_index_insertv2.py" and 
> "secondary_index_checkv2.py" require pycassa.
> You will be able to execute  4th "secondary_index_checkv2.py" script with 
> following option so that 
> you get data with consistency level "one".  
> % python "secondary_index_checkv2.py" -one
> On the other hand, to acquire data with consistency level "quorum", you will 
> need to use following option.  
> % python "secondary_index_checkv2.py" -quorum
> You can check that result of data acquired by consistency level "one" is 
> different from one by consistency level "quorum".  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to