edgarRd commented on issue #1345: URL: https://github.com/apache/iceberg/issues/1345#issuecomment-686155144
> Internally, we started assigning a default name mapping when migrating existing tables to Iceberg. I am not sure a name mapping makes sense when we create a fresh Iceberg table. Yeah, I don't think a name mapping makes sense for Iceberg tables that won't import any data. For migrated tables, a default name mapping equivalent to the columns of the target table should be fine. That's what I did, but programmatically in https://github.com/apache/iceberg/pull/1399/files#diff-531684170cfef0f1493ced85a0c8dbfcR544 > My assumption was that we would need to assign a name mapping in the import/snapshot commands. I think the common case would be that when importing a table, users would want to import all the columns, so the default name mapping should be the same of the target table schema. This is consistent with the default you mentioned and my changed mentioned above if I understand correctly. > > Also, we ignore Parquet metrics for columns we could not determine ids for in SparkTableUtil. Should we do the same for ORC? Yes, this behavior should be covered by my change in https://github.com/apache/iceberg/pull/1399/files#diff-ea7008dc637345e7a36b5e1b5a579e13L264 and the ORC metrics implementation already skips columns that do not have an Iceberg field mapped by the id: https://github.com/apache/iceberg/pull/1399/files#diff-ea7008dc637345e7a36b5e1b5a579e13L116 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
