rdblue opened a new pull request #1336: URL: https://github.com/apache/iceberg/pull/1336
Iceberg indexes schemas by name to look up fields, but uses short names by omitting "element" and "value" names when a list or map has a struct element or value. For example, a list, `locations`, of structs `struct<lat: double, long: double>` will use names `locations.lat` instead of `locations.element.lat`. This works most of the time, but can lead to conflicts when a map value contains a field name `key`. In that case, the schema is rejected with a failure message like this: `ValidationException: Invalid schema: multiple fields for name some_map.key: 146 and 144` In addition, Spark passes names through `alterTable` that include the `value` and `element` names that are omitted. This PR fixes the problem by keeping track of two names, the full name with `element` and `value`, and secondary "short" names that omit them. Indexing now uses all of the full names and adds any short names that are not ambiguous. This change also requires indexing IDs separately rather than using a `BiMap` because IDs can have multiple names, which is not valid when inverting the `BiMap`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
