[
https://issues.apache.org/jira/browse/CASSANDRA-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911416#comment-13911416
]
Sylvain Lebresne commented on CASSANDRA-6668:
---------------------------------------------
To explain it without getting into implementation, the current semantic of an
UPDATE is that it implicitly set every columns of the PRIMARY KEY (marking the
presence of the PK column is, after all, the main (and nowadays only) reason
for the row marker). So that
{noformat}
UPDATE X WHERE id=11;
{noformat}
will always set the column {{id}} to 11, whatever {{X}} does and so
{noformat}
update ttl_issue set collection = collection - {'test_1000'} where id=11;
{noformat}
sets {{id}} without TTL, hence the end result.
Now to be honest, in hindsight, I'm not sure it's the most intuitive behavior
possible, if only because that's not very explicit in the syntax (note that I'm
well aware of the historical reasons why things works the way it is, I'm just
trying to take a step back on semantic). I think it would be more intuitive for
UPDATE to only set the columns in the SET clause, because that's what makes the
most sense imo. I.e. technically speaking, we would not insert the row marker
for UPDATE (we would for INSERT however).
That being said, changing that now would of course be a breaking change and we
should probably just stick to the current semantic. So anyway, it occured to me
that it's one point where the semantic is probably not too intuitive and we
might at least make sure we proper document it.
bq. we should probably reject TTL = 0
I'd rather not. We've use 0 for no ttl since the beginning of ttls and I don't
think it's much of a problem. I did pushed a quick update to the CQL doc
because arguably it wasn't properly documented, but I don't think it warrant
rejection. Especially since it's not at all impossible that it could break
users (I'm not suggesting anyone would use "TTL 0" in a query string, but it's
perfectly possible that someone uses a prepared statement with a bind marker
for the TTL, sometimes binding a strictly positive TTL and sometimes binding 0
to get no expiration).
bq. Second, we might want to leave the row marker alone (not overwrite it) for
DELETE and equivalent UPDATE queries.
DELETE never really do anything special with the row marker.
For UPDATE, well, I don't know. I'm usually a fan of limiting the number of
special case the semantic has. As said above, my preferred semantic (all notion
of backward compatibility aside) would be to just never insert a row marker in
UPDATE. Short of that, the current semantic of "UPDATE also implicitly set
every columns of the PK", while less intuitive, has at least the merit of being
simple, consistent and easy to explain. Adding special cases to that, typically
"... except not if the operation is intrinsically a delete", would make it more
complex but I'm not sure it would make it more intuitive. It would also be
breaking strictly speaking, and if we're willing to break the semantic so it
get more intuitive, I think I'd prefer going all the way to my "preferred"
semantic above.
> Inconsistent handling of row expiration using TTL in collections
> ----------------------------------------------------------------
>
> Key: CASSANDRA-6668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6668
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Apache Cassandra 2.0.3
> Apache Cassandra 1.2.8
> CQLSH client 3.1.6
> Reporter: DOAN DuyHai
> Priority: Critical
>
> The expiration of row when all TTLed columns have expired is inconsistent
> Scenario 1)
> {code:sql}
> cqlsh:test> create table ttl_issue(id int primary key,collection set<text>);
> cqlsh:test> update ttl_issue USING TTL 2 set collection = collection +
> {'test_2'} where id=10;
> cqlsh:test> update ttl_issue USING TTL 3 set collection = collection +
> {'test_3'} where id=10;
> cqlsh:test> select * from ttl_issue;
> id | collection
> ----+----------------------
> 10 | {'test_2', 'test_3'}
> cqlsh:test> select * from ttl_issue;
> id | collection
> ----+----------------------
> 10 | {'test_2', 'test_3'}
> cqlsh:test> select * from ttl_issue;
> id | collection
> ----+------------
> 10 | {'test_3'}
> cqlsh:test> select * from ttl_issue;
> cqlsh:test>
> {code}
> As we can see, after a few seconds, both columns of the collection are
> expired. When all columns of the set have expired, the SELECT * FROM
> ttl_issue *returns no result, meaning that the whole row has expired.*
> Scenario 2)
> {code:sql}
> cqlsh:test> update ttl_issue USING TTL 3 set collection = collection +
> {'test_3'} where id=11;
> cqlsh:test> update ttl_issue USING TTL 1000 set collection = collection +
> {'test_1000'} where id=11;
> cqlsh:test> update ttl_issue set collection = collection - {'test_1000'}
> where id=11;
> cqlsh:test> select * from ttl_issue;
> id | collection
> ----+------------
> 11 | {'test_3'}
> cqlsh:test> select * from ttl_issue;
> id | collection
> ----+------------
> 11 | {'test_3'}
> cqlsh:test> select * from ttl_issue;
> id | collection
> ----+------------
> 11 | {'test_3'}
> cqlsh:test> select * from ttl_issue;
> id | collection
> ----+------------
> 11 | null
> {code}
> In this second scenario. We add elements to the collection with TTL but then
> remove one of them. *After a while, although all TTLed columns have expired,
> the row is till there with only the primary key present.*
> One should expect to get the same behavior as in scenario 1), e.g. the
> complete row should expire.
> I've also tried removing one element from collection using TTL 0
> ({code:sql}update ttl_issue USING TTL 0 set collection = collection -
> {'test_1000'} where id=11;{code}) but the result is the same.
> Quick guest: bug on row deletion marker for specific collection element
> append/remove ?
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)