[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

GitBox Thu, 14 Jan 2021 02:50:49 -0800


yanghua commented on a change in pull request #2430:
URL: https://github.com/apache/hudi/pull/2430#discussion_r557306549




##########
File path: hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
##########
@@ -81,16 +103,50 @@ public static DFSPropertiesConfiguration 
readConfig(FileSystem fs, Path cfgPath,
     return conf;
   }
 
-  public static Configuration getHadoopConf() {
-    return new Configuration();
+  public static org.apache.hadoop.conf.Configuration getHadoopConf() {
+    // create HiveConf from hadoop configuration with hadoop conf directory 
configured.
+    org.apache.hadoop.conf.Configuration hadoopConf = null;
+    for (String possibleHadoopConfPath : 
HadoopUtils.possibleHadoopConfPaths(new Configuration())) {

Review comment:
       > The method firstly find the specified path `fs.hdfs.hadoopconf`, then 
directory `HADOOP_CONF_DIR` `HADOOP_HOME/conf` `HADOOP_HOME/etc/hadoop` from 
the system environment.
   > 
   
   I have watched the source code before raising this concern.
   
   > Even if storage is separated from computing, the `FileSystem` we created 
is still correct, if we split the hadoop conf files correctly.
   > 
   > In any case, we should not pass an empty hadoop configuration.
   
   I mean, do we allow the user's explicit parameter assignment as the highest 
priority? Greater than the default convention that some users may not know?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

Reply via email to