aokolnychyi commented on issue #2068:
URL: https://github.com/apache/iceberg/issues/2068#issuecomment-774208295
I finally found time to get through recent comments. I believe the
discussion is moving in the right direction.
> 1. Add name mapping to the Iceberg spec so that it is well-defined and we
have test cases to validate
> 2. Document how name mappings change when a schema evolves (allows adding
aliases)
+1. We have tested how name mappings behave when a schema is evolved (e.g.
simple cases like adding columns in the middle). That seems to work fine even
when the imported files don't have column ids.
> 3. Make sure that when we import files, there is a name mapping set for
the table.
I think we are already doing it in the snapshot and migrate procedures for
Spark. If we are to add `add_files`, we can assign a default name mapping too.
> 4. Build correct metadata from imported files based on the name mapping.
We try to respect name mappings while importing files in Spark (not sure
about Trino) but we trust the name mapping is correct.
> 5. Identify problems with the name mapping, like files with no readable /
mapped fields or incompatible data types
It sounds like we should enhance our import code that currently reads
footers to import stats to also perform schema checks?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]