[GitHub] rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

GitBox Mon, 10 Dec 2018 17:14:10 -0800

rdblue commented on a change in pull request #7: Allow custom hadoop properties 
to be loaded in the Spark data source
URL: https://github.com/apache/incubator-iceberg/pull/7#discussion_r240442276


 ##########
 File path: 
spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java
 ##########
 @@ -109,10 +113,19 @@ protected SparkSession lazySparkSession() {
     return lazySpark;
   }
 
-  protected Configuration lazyConf() {
+  protected Configuration lazyBaseConf() {
     if (lazyConf == null) {
       this.lazyConf = lazySparkSession().sparkContext().hadoopConfiguration();
     }
     return lazyConf;
   }
+
+  protected Configuration mergeIcebergHadoopConfs(Configuration baseConf, 
Map<String, String> options) {
+    Configuration resolvedConf = new Configuration(baseConf);
+    options.keySet().stream()
+        .filter(key -> key.startsWith("iceberg.hadoop"))
+        .filter(key -> baseConf.get(key) == null)
 
 Review comment:
   Hadoop configuration overrides Iceberg defaults because Iceberg defaults are 
applied if nothing is set. Table properties will be applied to the 
configuration next to override, and then write options to override the table 
configuration. So the environment config is in between defaults and table 
config because the Configuration is how those things are passed.
   
   Now that I look at this, I think that Iceberg table properties are currently 
highest precedence because the write options get applied to the configuration 
before the table properties do (so that they are set when the table is looked 
up). So we should either make the expectation that table properties will always 
win, or add the write properties a second time. Probably the latter option. 
What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

Reply via email to