Spiros Ioannou created CASSANDRA-14672:
------------------------------------------

             Summary: After deleting data in 3.11.3, reads fail: "open marker 
and close marker have different deletion times"
                 Key: CASSANDRA-14672
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14672
             Project: Cassandra
          Issue Type: Bug
         Environment: CentOS 7, GCE, 9 nodes, 4TB disk/~2TB full each, level 
compaction, timeseries data
            Reporter: Spiros Ioannou


We perform routinely perform deletions as the one described below.

We had 3.11.0, then we upgraded to 3.11.3. After upgrading we run the following 
deletion query:

 
{code:java}
DELETE FROM measurement_events_dbl WHERE measurement_source_id IN ( 
9df798a2-6337-11e8-b52b-42010afa015a,  9df7717e-6337-11e8-b52b-42010afa015a, 
a08b8042-6337-11e8-b52b-42010afa015a, a08e52cc-6337-11e8-b52b-42010afa015a, 
a08e6654-6337-11e8-b52b-42010afa015a, a08e6104-6337-11e8-b52b-42010afa015a, 
a08e6c76-6337-11e8-b52b-42010afa015a, a08e5a9c-6337-11e8-b52b-42010afa015a, 
a08bcc50-6337-11e8-b52b-42010afa015a) AND year IN (2018) AND measurement_time 
>= '2018-07-19 04:00:00'{code}
 

Immediately after that, trying to read the last value produces an error:
{code:java}
select * FROM measurement_events_dbl WHERE measurement_source_id = 
a08b8042-6337-11e8-b52b-42010afa015a AND year IN (2018) order by 
measurement_time desc limit 1;
ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] 
message="Operation failed - received 0 responses and 2 failures" 
info={'failures': 2, 'received_responses': 0, 'required_responses': 1, 
'consistency': 'ONE'}{code}
 

And the following exception:

 
{noformat}
WARN [ReadStage-4] 2018-08-29 06:59:53,505 
AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
Thread[ReadStage-4,5,main]: {}
java.lang.RuntimeException: java.lang.IllegalStateException: 
UnfilteredRowIterator for pvpms_mevents.measurement_events_dbl has an illegal 
RT bounds sequence: open marker and close marker have different deletion times
 at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2601)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_181]
 at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
 [apache-cassandra-3.11.3.jar:3.11.3]
 at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
[apache-cassandra-3.11.3.jar:3.11.3]
 at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
Caused by: java.lang.IllegalStateException: UnfilteredRowIterator for 
pvpms_mevents.measurement_events_dbl has an illegal RT bounds sequence: open 
marker and close marker have different deletion times
 at 
org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.ise(RTBoundValidator.java:103)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.applyToMarker(RTBoundValidator.java:81)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:148) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:136)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:92)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:308)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:352) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1889)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2597)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 ... 5 common frames omitted
 Suppressed: java.lang.IllegalStateException: UnfilteredRowIterator for 
pvpms_mevents.measurement_events_dbl has an illegal RT bounds sequence: 
expected all RTs to be closed, but the last one is open
 at 
org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.ise(RTBoundValidator.java:103)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.onPartitionClose(RTBoundValidator.java:96)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 at org.apache.cassandra.db.transform.BaseRows.runOnClose(BaseRows.java:91) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:86) 
~[apache-cassandra-3.11.3.jar:3.11.3]
 at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:309)
 ~[apache-cassandra-3.11.3.jar:3.11.3]
 ... 12 common frames omitted
 
{noformat}
 

We have 9 nodes ~2TB each, leveled compaction, repairs run daily in sequence.

Table definition is:
{noformat}
CREATE TABLE pvpms_mevents.measurement_events_dbl (
 measurement_source_id uuid,
 year int,
 measurement_time timestamp,
 event_reception_time timestamp,
 quality double,
 value double,
 PRIMARY KEY ((measurement_source_id, year), measurement_time)
) WITH CLUSTERING ORDER BY (measurement_time ASC)
 AND bloom_filter_fp_chance = 0.1
 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
 AND comment = ''
 AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
 AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND crc_check_chance = 1.0
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99PERCENTILE';{noformat}
 

We host those on GCE and recreated all the nodes with disk snapshots, and we 
reproduced the error: after re-running the DELETE the error was reproduced 
immediately.

 

We tried so far:

re-running repairs on all nodes and running nodetool garbagecollect with no 
success.

We are currently testing downgrade to 3.11.2 hoping it will work, please inform 
us if downgrading would cause issues.

 

This may be related to CASSANDRA-14515

 

 

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to