[GitHub] [iceberg] rdblue commented on a change in pull request #1399: Spark: Follow name mapping when importing ORC tables

GitBox Sun, 13 Sep 2020 17:11:11 -0700


rdblue commented on a change in pull request #1399:
URL: https://github.com/apache/iceberg/pull/1399#discussion_r487596377




##########
File path: spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java
##########
@@ -540,7 +541,8 @@ private static void importUnpartitionedSparkTable(
       Configuration conf = spark.sessionState().newHadoopConf();
       MetricsConfig metricsConfig = 
MetricsConfig.fromProperties(targetTable.properties());
       String nameMappingString = 
targetTable.properties().get(TableProperties.DEFAULT_NAME_MAPPING);
-      NameMapping nameMapping = nameMappingString != null ? 
NameMappingParser.fromJson(nameMappingString) : null;
+      NameMapping nameMapping = nameMappingString != null ?
+          NameMappingParser.fromJson(nameMappingString) : 
MappingUtil.create(targetTable.schema());

Review comment:
       I think we should change to use a name mapping by default in a separate 
PR. For now, let's just support a name mapping when importing ORC data.
   
   My rationale is that I think the two are independently useful. Someone might 
want to change the default, but not cherry-pick ORC changes. Similarly, someone 
might want to use a name mapping with ORC, but not want to default to name 
mapping.
   
   Also, I think that we would want to default to name mapping slightly 
differently. It doesn't make sense to me to create a temporary mapping that is 
used for metrics here, unless that mapping is also used to read the data. So I 
would prefer to update tables when importing and add a mapping to table 
metadata by default, if it is not already there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1399: Spark: Follow name mapping when importing ORC tables

Reply via email to