Grant Henke created KUDU-3146:
---------------------------------
Summary: Consistent table metadata for scans
Key: KUDU-3146
URL: https://issues.apache.org/jira/browse/KUDU-3146
Project: Kudu
Issue Type: Improvement
Affects Versions: 1.12.0
Reporter: Grant Henke
Currently there is a time between generating/deserializing a scan token and
opening the scanner that can result in and invalid schema when scanning the
table. This is especially the case when a column is renamed or dropped and then
another column with the same name is created.
This has been somewhat worked around client side by leveraging column ids and
mapping the scan token projection to the new schema based on the column ids.
However, this doesn't work when the scan token sends it's own metadata
(KUDU-1802).
We should provide a mechanism to allow the schema to be consistent and
guaranteed to work from the point a scan token is generated to the time it is
run/completed.
A simple approach might be to allow column ids to be passed on the scan
request. Instead of handling the mapping client side, this makes the column ids
more of a server side concern again (which was the original intent). This is
also how table renames are handled, by passing the table id. Of course this
wouldn't support destructive changes such as dropping a column, but that would
require a much larger change to keep the dropped column for a period of time
and use the snapshot time to scan using the schema at the given snapshot time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)