[GitHub] [iceberg] rdblue commented on a change in pull request #1558: load hive-site.xml for flink catalog

GitBox Fri, 09 Oct 2020 06:41:32 -0700


rdblue commented on a change in pull request #1558:
URL: https://github.com/apache/iceberg/pull/1558#discussion_r501874257




##########
File path: flink/src/main/java/org/apache/iceberg/flink/FlinkCatalogFactory.java
##########
@@ -67,14 +79,14 @@
    * @return an Iceberg catalog loader
    */
   protected CatalogLoader createCatalogLoader(String name, Map<String, String> 
options) {
-    String catalogType = options.getOrDefault(ICEBERG_CATALOG_TYPE, "hive");
+    String catalogType = options.getOrDefault(ICEBERG_CATALOG_TYPE, 
ICEBERG_CATALOG_TYPE_HIVE);
     switch (catalogType) {
-      case "hive":
+      case ICEBERG_CATALOG_TYPE_HIVE:

Review comment:
       These changes seem unrelated to loading hive-site.xml. Could you remove 
them?

##########
File path: flink/src/main/java/org/apache/iceberg/flink/FlinkCatalogFactory.java
##########
@@ -100,12 +112,18 @@ protected CatalogLoader createCatalogLoader(String name, 
Map<String, String> opt
     properties.add(HADOOP_WAREHOUSE_LOCATION);
     properties.add(DEFAULT_DATABASE);
     properties.add(BASE_NAMESPACE);
+    properties.add(HIVE_SITE_PATH);

Review comment:
       If I understand, this is trying to add the hive site path to the 
catalog's configuration properties? Why not load it from the environment? 
That's the typical way to load these configuration files.

##########
File path: flink/src/main/java/org/apache/iceberg/flink/FlinkCatalogFactory.java
##########
@@ -121,4 +139,58 @@ protected Catalog createCatalog(String name, Map<String, 
String> properties, Con
   public static Configuration clusterHadoopConf() {
     return 
HadoopUtils.getHadoopConfiguration(GlobalConfiguration.loadConfiguration());
   }
+
+  private void loadHiveConf(Configuration configuration, Map<String, String> 
properties) {
+    String hiveConfPath = properties.get(HIVE_SITE_PATH);
+    Path path = new Path(hiveConfPath);
+    String scheme = getScheme(path);
+    // We can add more storage support later，like s3
+    switch (scheme) {
+      case HIVE_SITE_SCHEME_HDFS:
+        downloadFromHdfs(configuration, path);
+        break;
+      case HIVE_SITE_SCHEME_FILE:
+        loadLocalHiveConf(configuration, hiveConfPath);
+        break;
+      default:
+        throw new UnsupportedOperationException(
+            "Unsupported FileSystem for scheme :" + scheme);
+    }
+  }
+
+  private String getScheme(Path path) {
+    String scheme = path.toUri().getScheme();
+    if (scheme == null) {
+      // for case :  /tmp/hive-site.xml
+      return HIVE_SITE_SCHEME_FILE;
+    } else {
+      return scheme;
+    }
+  }
+
+  private void loadLocalHiveConf(Configuration configuration, String 
localHiveSitePath) {
+    File file = new File(localHiveSitePath);
+    if (!file.exists()) {
+      throw new RuntimeException(localHiveSitePath + " doesn't exist. if in 
application mode ," +
+          " please provide a hdfs path for hive-site.xml");
+    } else {
+      configuration.addResource(localHiveSitePath);
+    }
+  }
+
+  private void downloadFromHdfs(Configuration configuration, Path 
hdfsHiveSitePath) {

Review comment:
       I've never seen code like this needed. Usually, `hive-site.xml` is 
loaded from the classpath. Here's some code that does it from our Spark build:
   
   ```scala
       val configFile = someClassLoader.getResource("hive-site.xml")
       if (configFile != null) {
         hadoopConfiguration.addResource(configFile)
       }
   ```

##########
File path: 
flink/src/test/java/org/apache/iceberg/flink/FlinkCatalogTestBase.java
##########
@@ -100,6 +100,8 @@ public FlinkCatalogTestBase(String catalogName, String[] 
baseNamespace) {
     config.put("type", "iceberg");
     config.put(FlinkCatalogFactory.ICEBERG_CATALOG_TYPE, isHadoopCatalog ? 
"hadoop" : "hive");
     config.put(FlinkCatalogFactory.HADOOP_WAREHOUSE_LOCATION, "file:" + 
warehouse);
+    String path = 
this.getClass().getClassLoader().getResource("hive-site.xml").getPath();
+    config.put(FlinkCatalogFactory.HIVE_SITE_PATH, path);

Review comment:
       I think this only works because the file is in the local FS for tests, 
and the FS is assumed to be "file" when there is no scheme. I don't think this 
is right.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1558: load hive-site.xml for flink catalog

Reply via email to