[ 
https://issues.apache.org/jira/browse/CASSANDRA-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911416#comment-13911416
 ] 

Sylvain Lebresne commented on CASSANDRA-6668:
---------------------------------------------

To explain it without getting into implementation, the current semantic of an 
UPDATE is that it implicitly set every columns of the PRIMARY KEY (marking the 
presence of the PK column is, after all, the main (and nowadays only) reason 
for the row marker). So that
{noformat}
UPDATE X WHERE id=11;
{noformat}
will always set the column {{id}} to 11, whatever {{X}} does and so
{noformat}
update ttl_issue set collection = collection - {'test_1000'} where id=11;
{noformat}
sets {{id}} without TTL, hence the end result.

Now to be honest, in hindsight, I'm not sure it's the most intuitive behavior 
possible, if only because that's not very explicit in the syntax (note that I'm 
well aware of the historical reasons why things works the way it is, I'm just 
trying to take a step back on semantic). I think it would be more intuitive for 
UPDATE to only set the columns in the SET clause, because that's what makes the 
most sense imo. I.e. technically speaking, we would not insert the row marker 
for UPDATE (we would for INSERT however).

That being said, changing that now would of course be a breaking change and we 
should probably just stick to the current semantic. So anyway, it occured to me 
that it's one point where the semantic is probably not too intuitive and we 
might at least make sure we proper document it.

bq. we should probably reject TTL = 0

I'd rather not. We've use 0 for no ttl since the beginning of ttls and I don't 
think it's much of a problem. I did pushed a quick update to the CQL doc 
because arguably it wasn't properly documented, but I don't think it warrant 
rejection. Especially since it's not at all impossible that it could break 
users (I'm not suggesting anyone would use "TTL 0" in a query string, but it's 
perfectly possible that someone uses a prepared statement with a bind marker 
for the TTL, sometimes binding a strictly positive TTL and sometimes binding 0 
to get no expiration). 

bq. Second, we might want to leave the row marker alone (not overwrite it) for 
DELETE and equivalent UPDATE queries.

DELETE never really do anything special with the row marker.

For UPDATE, well, I don't know. I'm usually a fan of limiting the number of 
special case the semantic has. As said above, my preferred semantic (all notion 
of backward compatibility aside) would be to just never insert a row marker in 
UPDATE. Short of that, the current semantic of "UPDATE also implicitly set 
every columns of the PK", while less intuitive, has at least the merit of being 
simple, consistent and easy to explain. Adding special cases to that, typically 
"... except not if the operation is intrinsically a delete", would make it more 
complex but I'm not sure it would make it more intuitive. It would also be 
breaking strictly speaking, and if we're willing to break the semantic so it 
get more intuitive, I think I'd prefer going all the way to my "preferred" 
semantic above.




> Inconsistent handling of row expiration using TTL in collections
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-6668
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6668
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Apache Cassandra 2.0.3
> Apache Cassandra 1.2.8
> CQLSH client 3.1.6
>            Reporter: DOAN DuyHai
>            Priority: Critical
>
> The expiration of row when all TTLed columns have expired is inconsistent
> Scenario 1)
> {code:sql}
> cqlsh:test> create table ttl_issue(id int primary key,collection set<text>);
> cqlsh:test> update ttl_issue USING TTL 2 set collection = collection + 
> {'test_2'} where id=10;
> cqlsh:test> update ttl_issue USING TTL 3 set collection = collection + 
> {'test_3'} where id=10;
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+----------------------
>  10 | {'test_2', 'test_3'}
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+----------------------
>  10 | {'test_2', 'test_3'}
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+------------
>  10 | {'test_3'}
> cqlsh:test> select * from ttl_issue;
> cqlsh:test> 
> {code}
>  As we can see, after a few seconds, both columns of the collection are 
> expired. When all columns of the set have expired, the SELECT * FROM 
> ttl_issue *returns no result, meaning that the whole row has expired.*
> Scenario 2)
> {code:sql}
> cqlsh:test> update ttl_issue USING TTL 3 set collection = collection + 
> {'test_3'} where id=11;
> cqlsh:test> update ttl_issue USING TTL 1000 set collection = collection + 
> {'test_1000'} where id=11;
> cqlsh:test> update ttl_issue set collection = collection - {'test_1000'} 
> where id=11;
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+------------
>  11 | {'test_3'}
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+------------
>  11 | {'test_3'}
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+------------
>  11 | {'test_3'}
> cqlsh:test> select * from ttl_issue;
>  id | collection
> ----+------------
>  11 |       null
> {code}
>  In this second scenario. We add elements to the collection with TTL but then 
> remove one of them. *After a while, although all TTLed columns have expired, 
> the row is till there with only the primary key present.*
>  One should expect to get the same behavior as in scenario 1), e.g. the 
> complete row should expire.
>  I've also tried removing one element from collection using TTL 0 
> ({code:sql}update ttl_issue USING TTL 0 set collection = collection - 
> {'test_1000'} where id=11;{code})  but the result is the same.
>  Quick guest: bug on row deletion marker for specific collection element 
> append/remove ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to