rdblue commented on a change in pull request #1399:
URL: https://github.com/apache/iceberg/pull/1399#discussion_r487596377
##########
File path: spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java
##########
@@ -540,7 +541,8 @@ private static void importUnpartitionedSparkTable(
Configuration conf = spark.sessionState().newHadoopConf();
MetricsConfig metricsConfig =
MetricsConfig.fromProperties(targetTable.properties());
String nameMappingString =
targetTable.properties().get(TableProperties.DEFAULT_NAME_MAPPING);
- NameMapping nameMapping = nameMappingString != null ?
NameMappingParser.fromJson(nameMappingString) : null;
+ NameMapping nameMapping = nameMappingString != null ?
+ NameMappingParser.fromJson(nameMappingString) :
MappingUtil.create(targetTable.schema());
Review comment:
I think we should change to use a name mapping by default in a separate
PR. For now, let's just support a name mapping when importing ORC data.
My rationale is that I think the two are independently useful. Someone might
want to change the default, but not cherry-pick ORC changes. Similarly, someone
might want to use a name mapping with ORC, but not want to default to name
mapping.
Also, I think that we would want to default to name mapping slightly
differently. It doesn't make sense to me to create a temporary mapping that is
used for metrics here, unless that mapping is also used to read the data. So I
would prefer to update tables when importing and add a mapping to table
metadata by default, if it is not already there.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]