RussellSpitzer opened a new issue #3542: URL: https://github.com/apache/iceberg/issues/3542
Currently in the column projection section we have this text ``` Column Projection Columns in Iceberg data files are selected by field id. The table schema’s column names and order may change after a data file is written, and projection must be done using field ids. If a field id is missing from a data file, its value for each row should be null. For example, a file may be written with schema 1: a int, 2: b string, 3: c double and read using projection schema 3: measurement, 2: name, 4: a. This must select file columns c (renamed to measurement), b (now called name), and a column of null values called a; in that order. ``` Which while technically true seems to miss the well supported concept of NameMappings which exist in several of the engines already and is explicitly used in Snapshot, Migrate and AddFiles actions. I suggest we fully document NameMapping in the spec so we have less surprises when moving from one engine to another. Something like ``` ... Files without field ids, like those imported from another system, will have id's assigned to them based on the table's NameMapping. The name mapping should be a JSON map of column name to field ids stored in the table properties under the key $KEYNAMEHERE. This mapping is only used on files when field ids are completely absent. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
