kbendick commented on a change in pull request #2792:
URL: https://github.com/apache/iceberg/pull/2792#discussion_r667174160
##########
File path: spark/src/main/java/org/apache/iceberg/spark/SparkUtil.java
##########
@@ -99,4 +103,30 @@ public static void
validatePartitionTransforms(PartitionSpec spec) {
}
}
}
+
+ /**
+ * Pulls any Catalog specific overrides for the Hadoop conf from the current
SparkSession, which can be
+ * set via spark.sql.catalog.$catalogName.hadoop.*
+ *
+ * The SparkCatalog allows for hadoop configurations to be overridden per
catalog, by setting
+ * them on the SQLConf, where the following will add the property
"fs.default.name" with value
+ * "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration.
+ * SparkSession.builder()
+ * .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name",
"hdfs://hanksnamenode:8020")
+ * .getOrCreate()
+ * @param spark The current Spark session
+ * @param catalogName Name of the catalog to find overrides for.
+ * @return the Hadoop Configuration that should be used for this catalog,
with catalog specific overrides applied.
+ */
+ public static Configuration hadoopConfCatalogOverrides(SparkSession spark,
String catalogName) {
+ // Find keys for the catalog intended to be hadoop configurations
+ final String hadoopConfCatalogPrefix = String.format("%s.%s.%s",
ICEBERG_CATALOG_PREFIX, catalogName, "hadoop.");
+ Configuration conf = spark.sessionState().newHadoopConf();
+ spark.sqlContext().conf().settings().forEach((k, v) -> {
+ if (v != null && k.startsWith(hadoopConfCatalogPrefix)) {
Review comment:
I was able to put a `null` key into a `scala.Map[String, String]`.
```
scala> var nullString: String = null
nullString: String = null
scala> x += nullString -> "5"
res3: scala.collection.mutable.Map[String,String] = Map(null -> 5)
scala> x
res4: scala.collection.mutable.Map[String,String] = Map(null -> 5)
```
However, putting a `null` key into the hadoop configuration throws:
```
scala> var config = spark.sessionState.newHadoopConf
config: org.apache.hadoop.conf.Configuration = Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
__spark_hadoop_conf__.xml
scala> config.set(null, "10")
java.lang.IllegalArgumentException: Property name must not be null
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1353)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1337)
... 47 elided
```
I think that `settings` shouldn't return a `null` key, but I can add a check
just in case if we think it's a good idea.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]