RussellSpitzer opened a new issue #3542:
URL: https://github.com/apache/iceberg/issues/3542


   Currently in the column projection section we have this text
   ```
   Column Projection
   Columns in Iceberg data files are selected by field id. The table schema’s 
column names and order may change after a data file is written, and projection 
must be done using field ids. If a field id is missing from a data file, its 
value for each row should be null.
   
   For example, a file may be written with schema 1: a int, 2: b string, 3: c 
double and read using projection schema 3: measurement, 2: name, 4: a. This 
must select file columns c (renamed to measurement), b (now called name), and a 
column of null values called a; in that order.
   ```
   
   Which while technically true seems to miss the well supported concept of 
NameMappings which exist in several of the engines already and is explicitly 
used in Snapshot, Migrate and AddFiles actions. I suggest we fully document 
NameMapping in the spec so we have less surprises when moving from one engine 
to another. Something like
   
   ```
   ...
   
   Files without field ids, like those imported from another system, will have 
id's assigned to them based on the table's NameMapping. The name mapping should 
be a JSON map of column name to field ids stored in the table properties under 
the key $KEYNAMEHERE. This mapping is only used on files when field ids are 
completely absent.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to