aokolnychyi commented on pull request #2240:
URL: https://github.com/apache/iceberg/pull/2240#issuecomment-786207748
I see that the current approach tries to promote the sort columns on read,
which is one way of doing this. I think I'd prefer the approach we took with
`validateReferencedColumns` that does its validation during table creation and
in a few other relevant places. If we do it that way, we don't need to modify
any ORC and Parquet code and the scope of the change will be really small.
Basically, we could modify `TableMetadata#newTableMetadata` as this:
```
MetricsConfig metricsConfig = MetricsConfig.fromProperties(properties);
metricsConfig.validateReferencedColumns(schema);
MetricsConfig freshMetricsConfig =
metricsConfig.copyWithPromotedSortColumns(freshSortOrder);
Map<String, String> freshProperties = Maps.newHashMap(properties);
freshProperties.putAll(freshMetricsConfig.toProperties());
```
That will need two new methods in `MetricsConfig` (let's not make them
public):
```
MetricsConfig copyWithPromotedSortColumns(SortOrder sortOrder) {
// create a copy
// check the default mode
// if none or count -> iterate through the sort order fields and promote
as needed if not set by the user
}
Map<String, String> toProperties() {
// map into table props
}
```
What are your thoughts, @RussellSpitzer @rdblue @holdenk @shardulm94 @pvary?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]