[ https://issues.apache.org/jira/browse/CASSANDRA-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310097#comment-15310097 ]
Sebastian Marsching commented on CASSANDRA-11000: ------------------------------------------------- Thank you very much for the elaborate explanation. I agree that adding a lot of complexity to the code for having a solution that would not cover all cases anyway does not make sense and is more likely to be the source of even more problems. In this case, having a clear warning in the documentation should be sufficient. Just a short question to make sure that I understand the consequences of mixing LWT updates with non-LWT reads. Once an LWT update has succeeded (the operation has returned indicating success), I can be sure that a SELECT with a CL of QUORUM will also see that data. A SELECT with a CL of SERIAL is only necessary when I want to be sure that I also see LWT updates that have "failed" (e.g. the client has seen a timeout) and that will only become visible later. Is this correct or have I missed something? I am asking because I have an application that rarely does updates (always using an LWT), but has frequent reads. I guess that this is a very common scenario for LWTs. It is not a problem for me when I do not see the latest state in case the LWT update has reported a problem to the client. It is only important that I see the latest state when success has been indicated to the client performing the update. > Mixing LWT and non-LWT operations can result in an LWT operation being > acknowledged but not applied > --------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-11000 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11000 > Project: Cassandra > Issue Type: Bug > Components: Coordination > Environment: Cassandra 2.1, 2.2, and 3.0 on Linux and OS X. > Reporter: Sebastian Marsching > > When mixing light-weight transaction (LWT, a.k.a. compare-and-set, > conditional update) operations with regular operations, it can happen that an > LWT operation is acknowledged (applied = True), even though the update has > not been applied and a SELECT operation still returns the old data. > For example, consider the following table: > {code} > CREATE TABLE test ( > pk text, > ck text, > v text, > PRIMARY KEY (pk, ck) > ); > {code} > We start with an empty table and insert data using a regular (non-LWT) > operation: > {code} > INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123'); > {code} > A following SELECT statement returns the data as expected. Now we do a > conditional update (LWT): > {code} > UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123'; > {code} > As expected, the update is applied and a following SELECT statement shows the > updated value. > Now we do the same but use a time stamp that is slightly in the future (e.g. > a few seconds) for the INSERT statement (obviously $time$ needs to be > replaced by a time stamp that is slightly ahead of the system clock). > {code} > INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123') USING TIMESTAMP > $time$; > {code} > Now, running the same UPDATE statement still report success (applied = True). > However, a subsequent SELECT yields the old value ('123') instead of the > updated value ('456'). Inspecting the time stamp of the value indicates that > it has not been replaced (the value from the original INSERT is still in > place). > This behavior is exhibited in an single-node cluster running Cassandra > 2.1.11, 2.2.4, and 3.0.1. > Testing this for a multi-node cluster is a bit more tricky, so I only tested > it with Cassandra 2.2.4. Here, I made one of the nodes lack behind in time > for a few seconds (using libfaketime). I used a replication factor of three > for the test keyspace. In this case, the behavior can be demonstrated even > without using an explicitly specified time stamp. Running > {code} > INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123'); > {code} > on a node with the regular clock followed by > {code} > UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123'; > {code} > on the node lagging behind results in the UPDATE to report success, but the > old value still being used. > Interestingly, everything works as expected if using LWT operations > consistently: When running > {code} > UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123'; > UPDATE test SET v = '123' WHERE pk = 'foo' AND ck = 'bar' IF v = '456'; > {code} > in an alternating fashion on two nodes (one with a "normal" clock, one with > the clock lagging behind), the updates are applied as expected. When checking > the time stamps ("{{SELECT WRITETIME(v) FROM test;}}"), one can see that the > time stamp is increased by just a single tick when the statement is executed > on the node lagging behind. > I think that this problem is strongly related to (or maybe even the same as) > the one described in CASSANDRA-7801, even though CASSANDRA-7801 was mainly > concerned about a single-node cluster. However, the fact that this problem > still exists in current versions of Cassandra makes me suspect that either it > is a different problem or the original problem was not fixed completely with > the patch from CASSANDRA-7801. > I found CASSANDRA-9655 which suggest removing the changes introduced with > CASSANDRA-7801 because they can be problematic under certain circumstances, > but I am not sure whether this is the right place to discuss the issue I am > experiencing. If you feel so, feel free to close this issue and update the > description of CASSANDRA-9655. > In my opinion, the best way to fix this problem would be ensuring that a > write that is part of a LWT always uses a time stamp that is at least one > tick greater than the time stamp of the existing data. As the existing data > has to be read for checking the condition anyway, I do not think that this > would cause an additional overhead. If this is not possible, I suggest to > look into whether we can somehow detect such a situation and at least report > failure (applied = False) on the LWT instead of reporting success. > The latter solution would at least fix those cases where code checks the > success of a LWT before performing any further actions (e.g. because the LWT > is used to take some kind of lock). Currently, the code will assume that the > operation was successful (and thus - staying in the example - it owns the > lock), while other processes running in parallel will see a different state. > It is my understanding that LWTs were designed to avoid exactly this > situation, but at the moment the assumptions most users will make about LWTs > do not always hold. > Until this issue is solved, I suggest at least updating the CQL documentation > and clearly stating that LWTs / conditional updates are not safe if data has > been previously INSERTed / UPDATEd / DELETEd using non-LWT operations and > there is a clock skew or time stamps that are in the future have been > supplied explicitly. This should at least save some users from making wrong > assumptions about LWTs and not realizing it until their application fails in > an unsafe way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)