[
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769511#comment-17769511
]
Jaroslaw Kijanowski commented on CASSANDRA-18798:
-------------------------------------------------
Here's another example. Here we have an anomaly for register "3":
{code:java}
{:type :ok, :process 2, :value [[:r 3 [2001 2002 4001]] [:append 3 2501]], :tid
5, :n 3, :time 1695632953765285595}
{:type :ok, :process 1, :value [[:append 3 503] [:append 4 504]], :tid 1, :n 3,
:time 1695632953772299980}
{:type :ok, :process 10, :value [[:r 5 [4501 502 1 2]] [:r 3 [2001 2002 4001
503 2501]]], :tid 2, :n 3, :time 1695632953815153926} {code}
The Elle checker prints:
{code:java}
Let:
T1 = {:index 46, :time 1695632953765285595, :type :ok, :process 2, :f nil,
:value [[:r 3 [2001 2002 4001]] [:append 3 2501]], :tid 5, :n 3}
T2 = {:index 47, :time 1695632953772299980, :type :ok, :process 1, :f nil,
:value [[:append 3 503] [:append 4 504]], :tid 1, :n 3}
Then:
- T1 < T2, because T1 did not observe T2's append of 503 to 3.
- However, T2 < T1, because T1 appended 2501 after T2 appended 503 to 3: a
contradiction! {code}
An sstable dump:
{code:java}
{
"partition": {
"key": [
"3"
],
"position": 21943
},
"rows": [
{
"type": "row",
"position": 21961,
"cells": [
{
"name": "contents",
"path": [
"31c0b3e0-5b83-11ee-b108-ab7b34917b80"
],
"value": 2001,
"tstamp": "1695632953016000"
},
{
"name": "contents",
"path": [
"328ba500-5b83-11ee-a2ab-6fb50612de54"
],
"value": 2002,
"tstamp": "1695632954193000"
},
{
"name": "contents",
"path": [
"331a2960-5b83-11ee-b108-ab7b34917b80"
],
"value": 4001,
"tstamp": "1695632955126000"
},
{
"name": "contents",
"path": [
"331ce880-5b83-11ee-bca8-a35a0bbb26a1"
],
"value": 503,
"tstamp": "1695632955154000"
},
{
"name": "contents",
"path": [
"331d36a0-5b83-11ee-a2ab-6fb50612de54"
],
"value": 2501,
"tstamp": "1695632955147000"
},
...
]
}
]
} {code}
In short, the table order is:
{code:java}
2001 1695632953016000
2002 1695632954193000
4001 1695632955126000
503 1695632955154000
2501 1695632955147000{code}
But if it would be timestamp ordered it would be valid:
{code:java}
2001 1695632953016000
2002 1695632954193000
4001 1695632955126000
2501 1695632955147000
503 1695632955154000
{code}
> Appending to list in Accord transactions uses insertion timestamp
> -----------------------------------------------------------------
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
> Issue Type: Bug
> Components: Accord
> Reporter: Jaroslaw Kijanowski
> Priority: Normal
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class':
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents
> LIST<bigint>);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
> LET row = (SELECT * FROM list_append WHERE id = ?);
> SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
> UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in
> the edn format from a test.
> {code:java}
> {:type :invoke :process 8 :value [[:append 5 352]] :tid 3 :n 52
> :time 1692607285967116627}
> {:type :invoke :process 9 :value [[:r 5 nil]] :tid 1 :n 54
> :time 1692607286078732473}
> {:type :invoke :process 6 :value [[:append 5 553]] :tid 5 :n 53
> :time 1692607286133833428}
> {:type :invoke :process 7 :value [[:append 5 455]] :tid 4 :n 55
> :time 1692607286149702511}
> {:type :ok :process 8 :value [[:append 5 352]] :tid 3 :n 52
> :time 1692607286156314099}
> {:type :invoke :process 5 :value [[:r 5 nil]] :tid 9 :n 52
> :time 1692607286167090389}
> {:type :ok :process 9 :value [[:r 5 [303 304 604 6 306 509 909 409 912
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48
> 852 352]]] :tid 1 :n 54 :time 1692607286168657534}
> {:type :invoke :process 1 :value [[:r 5 nil]] :tid 0 :n 51
> :time 1692607286201762938}
> {:type :ok :process 7 :value [[:append 5 455]] :tid 4 :n 55
> :time 1692607286245571513}
> {:type :invoke :process 7 :value [[:r 5 nil]] :tid 4 :n 56
> :time 1692607286245655775}
> {:type :ok :process 5 :value [[:r 5 [303 304 604 6 306 509 909 409 912
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48
> 852 352 455]]] :tid 9 :n 52 :time 1692607286253928906}
> {:type :invoke :process 5 :value [[:r 5 nil]] :tid 9 :n 53
> :time 1692607286254095215}
> {:type :ok :process 6 :value [[:append 5 553]] :tid 5 :n 53
> :time 1692607286266263422}
> {:type :ok :process 1 :value [[:r 5 [303 304 604 6 306 509 909 409 912
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48
> 852 352 553 455]]] :tid 0 :n 51 :time 1692607286271617955}
> {:type :ok :process 7 :value [[:r 5 [303 304 604 6 306 509 909 409 912
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48
> 852 352 553 455]]] :tid 4 :n 56 :time 1692607286271816933}
> {:type :ok :process 5 :value [[:r 5 [303 304 604 6 306 509 909 409 912
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48
> 852 352 553 455]]] :tid 9 :n 53 :time 1692607286281483026}
> {:type :invoke :process 9 :value [[:r 5 nil]] :tid 1 :n 56
> :time 1692607286284097561}
> {:type :ok :process 9 :value [[:r 5 [303 304 604 6 306 509 909 409 912
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48
> 852 352 553 455]]] :tid 1 :n 56 :time 1692607286306445242}
> {code}
> Processes process 6 and process 7 are appending the values 553 and 455
> respectively. 455 succeeded and a read by process 5 confirms that. But then
> also 553 is appended and a read by process 1 confirms that as well, however
> it sees 553 before 455.
> process 5 reads [... 852 352 455] where as process 1 reads [... 852 352 553
> 455] and the latter order is returned in subsequent reads as well.
> [~blambov] suggested that one reason for that behavior could be the way how
> unfrozen lists are updated. The backing datatype is a _kind of a map_ which
> uses insertion timestamps as indexes which are used to sort the list when the
> list is composed from chunks from various sources/sstables before being
> returned to the client.
> In such a case it indeed can happen, that process 5 reads [... 852 352 455]
> but later process 1 reads [... 852 352 553 455] because 553 has been
> _appended_ with an earlier timestamp than 455 but it has been _committed_
> with a later timestamp.
> Now with Accord we have the timestamp _of the transaction_ at hand. Could
> Accord use that for the index instead? Which would lead to the correct
> behavior? The value 553 has been appended after 455 and using the transaction
> id/timestamp as the list index would place it properly in the underlying map,
> wouldn't it?
> Steps to reproduce:
>
> {code:java}
> git clone https://github.com/datastax/accordclient.git
> git checkout append-to-list-index
> lein run --list-append -t 10 -r 1,2,3,4,5 -n 1000 -H <host-ips> -s `date
> +%s%N` > test-la.edn
>
> curl -L -o elle-cli.zip
> https://github.com/ligurio/elle-cli/releases/download/0.1.6/elle-cli-bin-0.1.6.zip
> unzip -d elle-cli elle-cli.zip
> java -jar elle-cli/target/elle-cli-0.1.6-standalone.jar --model list-append
> --anomalies G0 --consistency-models strict-serializable --directory out-la
> --verbose test-la.edn
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]