[GitHub] [iceberg] kbendick commented on a change in pull request #2792: [SPARK] Allow spark catalogs to have hadoop configuration overrides p…

GitBox Fri, 09 Jul 2021 12:42:26 -0700


kbendick commented on a change in pull request #2792:
URL: https://github.com/apache/iceberg/pull/2792#discussion_r667174160




##########
File path: spark/src/main/java/org/apache/iceberg/spark/SparkUtil.java
##########
@@ -99,4 +103,30 @@ public static void 
validatePartitionTransforms(PartitionSpec spec) {
       }
     }
   }
+
+  /**
+   * Pulls any Catalog specific overrides for the Hadoop conf from the current 
SparkSession, which can be
+   * set via spark.sql.catalog.$catalogName.hadoop.*
+   *
+   * The SparkCatalog allows for hadoop configurations to be overridden per 
catalog, by setting
+   * them on the SQLConf, where the following will add the property 
"fs.default.name" with value
+   * "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration.
+   *   SparkSession.builder()
+   *     .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name", 
"hdfs://hanksnamenode:8020")
+   *     .getOrCreate()
+   * @param spark The current Spark session
+   * @param catalogName Name of the catalog to find overrides for.
+   * @return the Hadoop Configuration that should be used for this catalog, 
with catalog specific overrides applied.
+   */
+  public static Configuration hadoopConfCatalogOverrides(SparkSession spark, 
String catalogName) {
+    // Find keys for the catalog intended to be hadoop configurations
+    final String hadoopConfCatalogPrefix = String.format("%s.%s.%s", 
ICEBERG_CATALOG_PREFIX, catalogName, "hadoop.");
+    Configuration conf = spark.sessionState().newHadoopConf();
+    spark.sqlContext().conf().settings().forEach((k, v) -> {
+      if (v != null && k.startsWith(hadoopConfCatalogPrefix)) {

Review comment:
       I was able to put a `null` key into a `scala.Map[String, String]`.
   ```
   scala> var nullString: String = null
   nullString: String = null
   
   scala> x += nullString -> "5"
   res3: scala.collection.mutable.Map[String,String] = Map(null -> 5)
   
   scala> x
   res4: scala.collection.mutable.Map[String,String] = Map(null -> 5)
   ```
   
   However, putting a `null` key into the hadoop configuration throws:
   ```
   scala> var config = spark.sessionState.newHadoopConf
   config: org.apache.hadoop.conf.Configuration = Configuration: 
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, 
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, 
__spark_hadoop_conf__.xml
   
   scala> config.set(null, "10")
   java.lang.IllegalArgumentException: Property name must not be null
     at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
     at org.apache.hadoop.conf.Configuration.set(Configuration.java:1353)
     at org.apache.hadoop.conf.Configuration.set(Configuration.java:1337)
     ... 47 elided
   ```
   
   I think that `settings` shouldn't return a `null` key, but I can add a check 
just in case if we think it's a good idea.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on a change in pull request #2792: [SPARK] Allow spark catalogs to have hadoop configuration overrides p…

Reply via email to