[ 
https://issues.apache.org/jira/browse/CASSANDRA-11680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-11680:
------------------------------------------
    Resolution: Invalid
        Status: Resolved  (was: Awaiting Feedback)

> Inconsistent data while paging through a table
> ----------------------------------------------
>
>                 Key: CASSANDRA-11680
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11680
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: siddharth verma
>
> We have the following table structure:
> CREATE TABLE keyspace.book_properties (
> book_id text,
> group_id bigint,
> property_display_name text,
> created timestamp,
> property_name text,
> property_uuid uuid,
> property_value text,
> updated timestamp,
> PRIMARY KEY (book_id, group_id, property_display_name)
> ) WITH CLUSTERING ORDER BY (group_id ASC, property_display_name ASC);
> We have lucene indexes on group_id, property_display_name, created, 
> property_name, property_uuid, updated
> When we run a full table scan. Below is the sample code snippet
> boundStatement = new BoundStatement(session.prepare("select * from 
> keyspace.book_properties");
> boundStatement.setConsistencyLevel(ConsistencyLevel.ALL);
> boundStatement.setFetchSize(fetchSize);
> PagingState currentPageInfo = null;
> do {
> try {
> if (currentPageInfo != null)
> { boundStatement.setPagingState(currentPageInfo); }
> ResultSet rs = session.execute(boundStatement);
> processResultSet(rs);
> currentPageInfo = rs.getExecutionInfo().getPagingState();
> } catch (NoHostAvailableException e) {
> }
> } while (currentPageInfo != null);
> ......
> processResultSet(ResultSet rs){
> int remaining = rs.getAvailableWithoutFetching();
> if (remaining != 0) {
> for (Row row : rs) {
> processCassandraRow(row);
> if (--remaining == 0)
> { break; }
> }
> }
> }
> Many a time, we got corrupted data in this process.
> 1. property_uuid was returned as null in many cases, when actual data had a 
> value for it.
> 2. returned value for property_uuid in table scan was different from 
> property_uuid as seen from cqlsh
> 3. returned value for group_id in table scan was different from group_id as 
> seen from cqlsh
> book_properties has around 140 million records.
> book_properties has heavy read, write and update requests while paging is in 
> process
> Cassandra version dsc3.0.3
> Side Note:
> For one of the inconsistent column, we specifically checked the writetime(..) 
> to make sure data hadn't been changed while the job was in process. This was 
> not the case
> checked for case 2 : select property_uuid, writetime(property_uuid) from 
> book_properties where book_id = 'BOOK31263786';
> Edit1:
> ->when we do "select * from book_properties where book_id = 'BOOK31263786';" 
> we get two records
> ->when while pagination job, I match and print Row where book_id = 
> 'BOOK31263786', and we got 4 records.
> It is a speculation from our side, that other two might have been deleted 
> some time back(definitely not during the job). Again, it is a speculation, 
> not sure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to