[ 
https://issues.apache.org/jira/browse/CASSANDRA-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310097#comment-15310097
 ] 

Sebastian Marsching commented on CASSANDRA-11000:
-------------------------------------------------

Thank you very much for the elaborate explanation. I agree that adding a lot of 
complexity to the code for having a solution that would not cover all cases 
anyway does not make sense and is more likely to be the source of even more 
problems. In this case, having a clear warning in the documentation should be 
sufficient.

Just a short question to make sure that I understand the consequences of mixing 
LWT updates with non-LWT reads. Once an LWT update has succeeded (the operation 
has returned indicating success), I can be sure that a SELECT with a CL of 
QUORUM will also see that data. A SELECT with a CL of SERIAL is only necessary 
when I want to be sure that I also see LWT updates that have "failed" (e.g. the 
client has seen a timeout) and that will only become visible later. Is this 
correct or have I missed something?

I am asking because I have an application that rarely does updates (always 
using an LWT), but has frequent reads. I guess that this is a very common 
scenario for LWTs. It is not a problem for me when I do not see the latest 
state in case the LWT update has reported a problem to the client. It is only 
important that I see the latest state when success has been indicated to the 
client performing the update.

> Mixing LWT and non-LWT operations can result in an LWT operation being 
> acknowledged but not applied
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11000
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11000
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>         Environment: Cassandra 2.1, 2.2, and 3.0 on Linux and OS X.
>            Reporter: Sebastian Marsching
>
> When mixing light-weight transaction (LWT, a.k.a. compare-and-set, 
> conditional update) operations with regular operations, it can happen that an 
> LWT operation is acknowledged (applied = True), even though the update has 
> not been applied and a SELECT operation still returns the old data.
> For example, consider the following table:
> {code}
> CREATE TABLE test (
>     pk text,
>     ck text,
>     v text,
>     PRIMARY KEY (pk, ck)
> );
> {code}
> We start with an empty table and insert data using a regular (non-LWT) 
> operation:
> {code}
> INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123');
> {code}
> A following SELECT statement returns the data as expected. Now we do a 
> conditional update (LWT):
> {code}
> UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
> {code}
> As expected, the update is applied and a following SELECT statement shows the 
> updated value.
> Now we do the same but use a time stamp that is slightly in the future (e.g. 
> a few seconds) for the INSERT statement (obviously $time$ needs to be 
> replaced by a time stamp that is slightly ahead of the system clock).
> {code}
> INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123') USING TIMESTAMP 
> $time$;
> {code}
> Now, running the same UPDATE statement still report success (applied = True). 
> However, a subsequent SELECT yields the old value ('123') instead of the 
> updated value ('456'). Inspecting the time stamp of the value indicates that 
> it has not been replaced (the value from the original INSERT is still in 
> place).
> This behavior is exhibited in an single-node cluster running Cassandra 
> 2.1.11, 2.2.4, and 3.0.1.
> Testing this for a multi-node cluster is a bit more tricky, so I only tested 
> it with Cassandra 2.2.4. Here, I made one of the nodes lack behind in time 
> for a few seconds (using libfaketime). I used a replication factor of three 
> for the test keyspace. In this case, the behavior can be demonstrated even 
> without using an explicitly specified time stamp. Running
> {code}
> INSERT INTO test (pk, ck, v) VALUES ('foo', 'bar', '123');
> {code}
> on a node with the regular clock followed by
> {code}
> UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
> {code}
> on the node lagging behind results in the UPDATE to report success, but the 
> old value still being used.
> Interestingly, everything works as expected if using LWT operations 
> consistently: When running
> {code}
> UPDATE test SET v = '456' WHERE pk = 'foo' AND ck = 'bar' IF v = '123';
> UPDATE test SET v = '123' WHERE pk = 'foo' AND ck = 'bar' IF v = '456';
> {code}
> in an alternating fashion on two nodes (one with a "normal" clock, one with 
> the clock lagging behind), the updates are applied as expected. When checking 
> the time stamps ("{{SELECT WRITETIME(v) FROM test;}}"), one can see that the 
> time stamp is increased by just a single tick when the statement is executed 
> on the node lagging behind.
> I think that this problem is strongly related to (or maybe even the same as) 
> the one described in CASSANDRA-7801, even though CASSANDRA-7801 was mainly 
> concerned about a single-node cluster. However, the fact that this problem 
> still exists in current versions of Cassandra makes me suspect that either it 
> is a different problem or the original problem was not fixed completely with 
> the patch from CASSANDRA-7801.
> I found CASSANDRA-9655 which suggest removing the changes introduced with 
> CASSANDRA-7801 because they can be problematic under certain circumstances, 
> but I am not sure whether this is the right place to discuss the issue I am 
> experiencing. If you feel so, feel free to close this issue and update the 
> description of CASSANDRA-9655.
> In my opinion, the best way to fix this problem would be ensuring that a 
> write that is part of a LWT always uses a time stamp that is at least one 
> tick greater than the time stamp of the existing data. As the existing data 
> has to be read for checking the condition anyway, I do not think that this 
> would cause an additional overhead. If this is not possible, I suggest to 
> look into whether we can somehow detect such a situation and at least report 
> failure (applied = False) on the LWT instead of reporting success.
> The latter solution would at least fix those cases where code checks the 
> success of a LWT before performing any further actions (e.g. because the LWT 
> is used to take some kind of lock). Currently, the code will assume that the 
> operation was successful (and thus - staying in the example - it owns the 
> lock), while other processes running in parallel will see a different state. 
> It is my understanding that LWTs were designed to avoid exactly this 
> situation, but at the moment the assumptions most users will make about LWTs 
> do not always hold.
> Until this issue is solved, I suggest at least updating the CQL documentation 
> and clearly stating that LWTs / conditional updates are not safe if data has 
> been previously INSERTed / UPDATEd / DELETEd using non-LWT operations and 
> there is a clock skew or time stamps that are in the future have been 
> supplied explicitly. This should at least save some users from making wrong 
> assumptions about LWTs and not realizing it until their application fails in 
> an unsafe way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to