[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Michallat updated CASSANDRA-10786:
------------------------------------------
    Description: 
*_Initial description:_*
This is a follow-up to CASSANDRA-7910, which was about invalidating a prepared 
statement when the table is altered, to force clients to update their local 
copy of the metadata.

There's still an issue if multiple clients are connected to the same host. The 
first client to execute the query after the cache was invalidated will receive 
an UNPREPARED response, re-prepare, and update its local metadata. But other 
clients might miss it entirely (the MD5 hasn't changed), and they will keep 
using their old metadata. For example:
# {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, clientA 
and clientB both have a cache of the metadata (columns b and c) locally
# column a gets added to the table, C* invalidates its cache entry
# clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
re-prepares on the fly and updates its local metadata to (a, b, c)
# prepared statement is now in C*’s cache again, with the same md5 abc123
# clientB sends an EXECUTE request for id abc123. Because the cache has been 
populated again, the query succeeds. But clientB still has not updated its 
metadata, it’s still (b,c)

One solution that was suggested is to include a hash of the result set metadata 
in the md5. This way the md5 would change at step 3, and any client using the 
old md5 would get an UNPREPARED, regardless of whether another client already 
reprepared.
-----
*_Resolution (2017/02/13):_*
The following changes were made to native protocol v5:
- the PREPARED response includes {{result_metadata_id}}, a hash of the result 
set metadata.
- every EXECUTE message must provide {{result_metadata_id}} in addition to the 
prepared statement id. If it doesn't match the current one on the server, it 
means the client is operating on a stale schema.
- to notify the client, the server returns a ROWS response with a new 
{{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
result metadata (this overrides the {{No_metadata}} flag, even if the client 
had requested it)
- the client updates its copy of the result metadata before it decodes the 
results.

So the scenario above would now look like:
# {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and 
result set (b, c) that hashes to cde456
# column a gets added to the table, C* does not invalidate its cache entry, but 
only updates the result set to (a, b, c) which hashes to fff789
# client sends an EXECUTE request for (statementId=abc123, resultId=cde456) and 
skip_metadata flag
# cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, 
metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c))
# client updates its column specifications, and will send the next execute 
queries with (statementId=abc123, resultId=fff789)

This works the same with multiple clients.

  was:
*_Initial description:_*
This is a follow-up to CASSANDRA-7910, which was about invalidating a prepared 
statement when the table is altered, to force clients to update their local 
copy of the metadata.

There's still an issue if multiple clients are connected to the same host. The 
first client to execute the query after the cache was invalidated will receive 
an UNPREPARED response, re-prepare, and update its local metadata. But other 
clients might miss it entirely (the MD5 hasn't changed), and they will keep 
using their old metadata. For example:
# {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, clientA 
and clientB both have a cache of the metadata (columns b and c) locally
# column a gets added to the table, C* invalidates its cache entry
# clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
re-prepares on the fly and updates its local metadata to (a, b, c)
# prepared statement is now in C*’s cache again, with the same md5 abc123
# clientB sends an EXECUTE request for id abc123. Because the cache has been 
populated again, the query succeeds. But clientB still has not updated its 
metadata, it’s still (b,c)

One solution that was suggested is to include a hash of the result set metadata 
in the md5. This way the md5 would change at step 3, and any client using the 
old md5 would get an UNPREPARED, regardless of whether another client already 
reprepared.
-----
*_Resolution:_*
The following changes were made to native protocol v5:
- the PREPARED response includes {{result_metadata_id}}, a hash of the result 
set metadata.
- every EXECUTE message must provide {{result_metadata_id}} in addition to the 
prepared statement id. If it doesn't match the current one on the server, it 
means the client is operating on a stale schema.
- to notify the client, the server returns a ROWS response with a new 
{{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
result metadata (this overrides the {{No_metadata}} flag, even if the client 
had requested it)
- the client updates its copy of the result metadata before it decodes the 
results.

So the scenario above would now look like:
# {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and 
result set (b, c) that hashes to cde456
# column a gets added to the table, C* does not invalidate its cache entry, but 
only updates the result set to (a, b, c) which hashes to fff789
# client sends an EXECUTE request for (statementId=abc123, resultId=cde456) and 
skip_metadata flag
# cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, 
metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c))
# client updates its column specifications, and will send the next execute 
queries with (statementId=abc123, resultId=fff789)

This works the same with multiple clients.


> Include hash of result set metadata in prepared statement id
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-10786
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: CQL
>            Reporter: Olivier Michallat
>            Assignee: Alex Petrov
>            Priority: Minor
>              Labels: client-impacting, doc-impacting, protocolv5
>             Fix For: 3.11.x
>
>
> *_Initial description:_*
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.
> -----
> *_Resolution (2017/02/13):_*
> The following changes were made to native protocol v5:
> - the PREPARED response includes {{result_metadata_id}}, a hash of the result 
> set metadata.
> - every EXECUTE message must provide {{result_metadata_id}} in addition to 
> the prepared statement id. If it doesn't match the current one on the server, 
> it means the client is operating on a stale schema.
> - to notify the client, the server returns a ROWS response with a new 
> {{Metadata_changed}} flag, the new {{result_metadata_id}} and the updated 
> result metadata (this overrides the {{No_metadata}} flag, even if the client 
> had requested it)
> - the client updates its copy of the result metadata before it decodes the 
> results.
> So the scenario above would now look like:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, and 
> result set (b, c) that hashes to cde456
> # column a gets added to the table, C* does not invalidate its cache entry, 
> but only updates the result set to (a, b, c) which hashes to fff789
> # client sends an EXECUTE request for (statementId=abc123, resultId=cde456) 
> and skip_metadata flag
> # cde456!=fff789, so C* responds with ROWS(..., no_metadata=false, 
> metadata_changed=true, new_metadata_id=fff789,col specs for (a,b,c))
> # client updates its column specifications, and will send the next execute 
> queries with (statementId=abc123, resultId=fff789)
> This works the same with multiple clients.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to