rdblue commented on PR #5932: URL: https://github.com/apache/iceberg/pull/5932#issuecomment-1272077852
Thanks for taking a look at this, @the-other-tim-brown, but I'm not sure that this is the right way to do what you want to accomplish. The fallback ID assignment is really old and makes assumptions about how the data has evolved -- specifically that position-based column resolution is valid (just like CSV with no header). This works for top-level columns, but it won't work for correctness with nested fields. With position-based column resolution, you can add columns to the end of the schema safely. So you can have some files with columns `1: a, 2: b` and some with columns `1: a, 2: b, 3: c` (note that ID assignment is consistent). The problem with nested columns is that the top-level field assignment can change the assignment for nested fields. For example: `1: a struct<3: x, 4: y>, 2: b` and `1: a struct<4: x, 5: y>, 2: b, 3: c`. I think what you probably want is to use a name mapping, which is more flexible and can probably handle what you want to do. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
