[GitHub] [iceberg] edgarRd commented on issue #1345: Spark: Follow name mapping while importing ORC tables

GitBox Wed, 02 Sep 2020 17:21:18 -0700


edgarRd commented on issue #1345:
URL: https://github.com/apache/iceberg/issues/1345#issuecomment-686155144



   > Internally, we started assigning a default name mapping when migrating 
existing tables to Iceberg. I am not sure a name mapping makes sense when we 
create a fresh Iceberg table. 
   
   Yeah, I don't think a name mapping makes sense for Iceberg tables that won't 
import any data. For migrated tables, a default name mapping equivalent to the 
columns of the target table should be fine. That's what I did, but 
programmatically in 
https://github.com/apache/iceberg/pull/1399/files#diff-531684170cfef0f1493ced85a0c8dbfcR544
 
   
   > My assumption was that we would need to assign a name mapping in the 
import/snapshot commands.
   
   I think the common case would be that when importing a table, users would 
want to import all the columns, so the default name mapping should be the same 
of the target table schema. This is consistent with the default you mentioned 
and my changed mentioned above if I understand correctly.
   
   > 
   > Also, we ignore Parquet metrics for columns we could not determine ids for 
in SparkTableUtil. Should we do the same for ORC?
   
   Yes, this behavior should be covered by my change in 
https://github.com/apache/iceberg/pull/1399/files#diff-ea7008dc637345e7a36b5e1b5a579e13L264
 and the ORC metrics implementation already skips columns that do not have an 
Iceberg field mapped by the id: 
https://github.com/apache/iceberg/pull/1399/files#diff-ea7008dc637345e7a36b5e1b5a579e13L116
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] edgarRd commented on issue #1345: Spark: Follow name mapping while importing ORC tables

Reply via email to