aokolnychyi commented on pull request #2240:
URL: https://github.com/apache/iceberg/pull/2240#issuecomment-786207748


   I see that the current approach tries to promote the sort columns on read, 
which is one way of doing this. I think I'd prefer the approach we took with 
`validateReferencedColumns` that does its validation during table creation and 
in a few other relevant places. If we do it that way, we don't need to modify 
any ORC and Parquet code and the scope of the change will be really small.
   
   Basically, we could modify `TableMetadata#newTableMetadata` as this:
   
   ```
       MetricsConfig metricsConfig = MetricsConfig.fromProperties(properties);
       metricsConfig.validateReferencedColumns(schema);
       MetricsConfig freshMetricsConfig = 
metricsConfig.copyWithPromotedSortColumns(freshSortOrder);
   
       Map<String, String> freshProperties = Maps.newHashMap(properties);
       freshProperties.putAll(freshMetricsConfig.toProperties());
   ```
   
   That will need two new methods in `MetricsConfig` (let's not make them 
public):
   
   ```
     MetricsConfig copyWithPromotedSortColumns(SortOrder sortOrder) {
       // create a copy
       // check the default mode
       // if none or count -> iterate through the sort order fields and promote 
as needed if not set by the user
     }
   
     Map<String, String> toProperties() {
       // map into table props
     }
   ```
   
   What are your thoughts, @RussellSpitzer @rdblue @holdenk @shardulm94 @pvary?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to