[jira] [Updated] (CASSANDRA-11680) Inconsistent data while paging through a table

siddharth verma (JIRA) Thu, 28 Apr 2016 04:10:24 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-11680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


siddharth verma updated CASSANDRA-11680:
----------------------------------------
    Description: 
We have the following table structure:
CREATE TABLE keyspace.book_properties (
book_id text,
group_id bigint,
property_display_name text,
created timestamp,
property_name text,
property_uuid uuid,
property_value text,
updated timestamp,
PRIMARY KEY (book_id, group_id, property_display_name)
) WITH CLUSTERING ORDER BY (group_id ASC, property_display_name ASC);

We have lucene indexes on group_id, property_display_name, created, 
property_name, property_uuid, updated

When we run a full table scan. Below is the sample code snippet

boundStatement = new BoundStatement(session.prepare("select * from 
keyspace.book_properties");
boundStatement.setConsistencyLevel(ConsistencyLevel.ALL);
boundStatement.setFetchSize(fetchSize);
PagingState currentPageInfo = null;
do {
try {
if (currentPageInfo != null)
{ boundStatement.setPagingState(currentPageInfo); }

ResultSet rs = session.execute(boundStatement);
processResultSet(rs);
currentPageInfo = rs.getExecutionInfo().getPagingState();
} catch (NoHostAvailableException e) {
}
} while (currentPageInfo != null);
......
processResultSet(ResultSet rs){
int remaining = rs.getAvailableWithoutFetching();
if (remaining != 0) {
for (Row row : rs) {
processCassandraRow(row);
if (--remaining == 0)
{ break; }
}
}
}

Many a time, we got corrupted data in this process.
1. property_uuid was returned as null in many cases, when actual data had a 
value for it.
2. returned value for property_uuid in table scan was different from 
property_uuid as seen from cqlsh
3. returned value for group_id in table scan was different from group_id as 
seen from cqlsh

book_properties has around 140 million records.

book_properties has heavy read, write and update requests while paging is in 
process

Cassandra version dsc3.0.3

Side Note:
For one of the inconsistent column, we specifically checked the writetime(..) 
to make sure data hadn't been changed while the job was in process. This was 
not the case
checked for case 2 : select property_uuid, writetime(property_uuid) from 
book_properties where book_id = 'BOOK31263786';

Edit1:
->when we do "select * from book_properties where book_id = 'BOOK31263786';" we 
get two records
->when while pagination job, I match and print Row where book_id = 
'BOOK31263786', and we got 4 records.
It is a speculation from our side, that other two might have been deleted some 
time back(definitely not during the job). Again, it is a speculation, not sure.


  was:
We have the following table structure:
CREATE TABLE keyspace.book_properties (
book_id text,
group_id bigint,
property_display_name text,
created timestamp,
property_name text,
property_uuid uuid,
property_value text,
updated timestamp,
PRIMARY KEY (book_id, group_id, property_display_name)
) WITH CLUSTERING ORDER BY (group_id ASC, property_display_name ASC);

We have lucene indexes on group_id, property_display_name, created, 
property_name, property_uuid, updated

When we run a full table scan. Below is the sample code snippet

boundStatement = new BoundStatement(session.prepare("select * from 
keyspace.book_properties");
boundStatement.setConsistencyLevel(ConsistencyLevel.ALL);
boundStatement.setFetchSize(fetchSize);
PagingState currentPageInfo = null;
do {
try {
if (currentPageInfo != null)
{ boundStatement.setPagingState(currentPageInfo); }

ResultSet rs = session.execute(boundStatement);
processResultSet(rs);
currentPageInfo = rs.getExecutionInfo().getPagingState();
} catch (NoHostAvailableException e) {
}
} while (currentPageInfo != null);
......
processResultSet(ResultSet rs){
int remaining = rs.getAvailableWithoutFetching();
if (remaining != 0) {
for (Row row : rs) {
processCassandraRow(row);
if (--remaining == 0)
{ break; }
}
}
}

Many a time, we got corrupted data in this process.
1. property_uuid was returned as null in many cases, when actual data had a 
value for it.
2. returned value for property_uuid in table scan was different from 
property_uuid as seen from cqlsh
3. returned value for group_id in table scan was different from group_id as 
seen from cqlsh

book_properties has around 140 million records.

book_properties has heavy read, write and update requests while paging is in 
process

Cassandra version dsc3.0.3

Side Note:
For one of the inconsistent column, we specifically checked the writetime(..) 
to make sure data hadn't been changed while the job was in process. This was 
not the case
checked for case 2 : select property_uuid, writetime(property_uuid) from 
book_properties where book_id = 'BOOK31263786';



> Inconsistent data while paging through a table
> ----------------------------------------------
>
>                 Key: CASSANDRA-11680
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11680
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: siddharth verma
>
> We have the following table structure:
> CREATE TABLE keyspace.book_properties (
> book_id text,
> group_id bigint,
> property_display_name text,
> created timestamp,
> property_name text,
> property_uuid uuid,
> property_value text,
> updated timestamp,
> PRIMARY KEY (book_id, group_id, property_display_name)
> ) WITH CLUSTERING ORDER BY (group_id ASC, property_display_name ASC);
> We have lucene indexes on group_id, property_display_name, created, 
> property_name, property_uuid, updated
> When we run a full table scan. Below is the sample code snippet
> boundStatement = new BoundStatement(session.prepare("select * from 
> keyspace.book_properties");
> boundStatement.setConsistencyLevel(ConsistencyLevel.ALL);
> boundStatement.setFetchSize(fetchSize);
> PagingState currentPageInfo = null;
> do {
> try {
> if (currentPageInfo != null)
> { boundStatement.setPagingState(currentPageInfo); }
> ResultSet rs = session.execute(boundStatement);
> processResultSet(rs);
> currentPageInfo = rs.getExecutionInfo().getPagingState();
> } catch (NoHostAvailableException e) {
> }
> } while (currentPageInfo != null);
> ......
> processResultSet(ResultSet rs){
> int remaining = rs.getAvailableWithoutFetching();
> if (remaining != 0) {
> for (Row row : rs) {
> processCassandraRow(row);
> if (--remaining == 0)
> { break; }
> }
> }
> }
> Many a time, we got corrupted data in this process.
> 1. property_uuid was returned as null in many cases, when actual data had a 
> value for it.
> 2. returned value for property_uuid in table scan was different from 
> property_uuid as seen from cqlsh
> 3. returned value for group_id in table scan was different from group_id as 
> seen from cqlsh
> book_properties has around 140 million records.
> book_properties has heavy read, write and update requests while paging is in 
> process
> Cassandra version dsc3.0.3
> Side Note:
> For one of the inconsistent column, we specifically checked the writetime(..) 
> to make sure data hadn't been changed while the job was in process. This was 
> not the case
> checked for case 2 : select property_uuid, writetime(property_uuid) from 
> book_properties where book_id = 'BOOK31263786';
> Edit1:
> ->when we do "select * from book_properties where book_id = 'BOOK31263786';" 
> we get two records
> ->when while pagination job, I match and print Row where book_id = 
> 'BOOK31263786', and we got 4 records.
> It is a speculation from our side, that other two might have been deleted 
> some time back(definitely not during the job). Again, it is a speculation, 
> not sure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11680) Inconsistent data while paging through a table

Reply via email to