Tishj opened a new issue, #13446: URL: https://github.com/apache/iceberg/issues/13446
### Apache Iceberg version None ### Query engine None ### Please describe the bug 🐞 In the spec (https://iceberg.apache.org/spec/) It is unclear what the difference between a column id, a field-id and a source-id is in the spec. (But I believe they are all synonymous with each other) ### Source Id There are 16 mentions of `source-id`, in none of them it is explained how it's different from field-id or column-id. The only way to piece this information together is this line: > A source column id or a list of source column ids from the table’s schema That is the **only** place where a `source-id` is referenced in an explanation, and it's not even searchable because it uses **source column id**, which is a term that is never used in conjunction with `source-id`. ### Column Id and Field Id There is a field `last-column-id` in the Table Metadata, that talks about a column ID. I can decipher from context that this talks about the `id` field of the entries in the Schema, but this is also never concisely tied together. `column id` is synonymous with `field-id`, which is also only really explained by this: > Column IDs are required to be stored as [field IDs](http://github.com/apache/parquet-format/blob/40699d05bd24181de6b1457babbee2c16dce3803/src/main/thrift/parquet.thrift#L459) on the parquet schema. For context, I have consumed this spec for the past months, and for the longest time I was convinced there was a difference between the two, expecting one of them (column id or source id) to refer to the **order** of appearance in the schema. ### Inconsistent use of `-` and `_` I'm not talking about the tables where they are listed, those are clear as day, but when they're referred to, the correct name isn't always used. One example with `next-row-id`: > - When a table is upgraded to v3, `next_row_id` should be initialized to 0 > - When committing a new snapshot `next-row-id` must be incremented by at least the number of newly assigned row ids in the snapshot This makes it so that you have to search for both variants to get all the mentions of the field. It would be great if some time can be spent on making these connections more clear, as the answers are currently **well** hidden. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [x] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
