yihua commented on code in PR #18204:
URL: https://github.com/apache/hudi/pull/18204#discussion_r3035034031


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/HudiHiveSyncJob.java:
##########


Review Comment:
   **Line 110:** 🤖 **Bug (still open from previous review):** The short alias 
`"bff"` is missing its leading dash. JCommander requires parameter names to 
start with `-` or `--`. As written, `-bff` or `--bff` on the command line won't 
be recognized, and bare `bff` is not a valid prefix.
   
   ```suggestion
       @Parameter(names = {"--base-file-format", "-bff"}, description = "Base 
file format of the dataset")
   ```



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/HudiHiveSyncJob.java:
##########


Review Comment:
   **Line 82:** 🤖 **Bug (still open from previous review):** The Spark master 
is hardcoded to `"local[2]"`. When this job is submitted via `spark-submit` to 
a YARN/K8s/standalone cluster, `UtilHelpers.buildSparkContext` will override 
the cluster-provided master with `local[2]`, running everything on the driver 
node only.
   
   Other Hudi utilities (e.g., `HoodieCompactor`, `HoodieCleaner`) pass 
`"local[*]"` as a fallback and respect `spark.master` from the environment. 
This should follow the same pattern, or better yet, not set a master at all and 
let `spark-submit` control it.



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/HudiHiveSyncJob.java:
##########


Review Comment:
   **Line 82:** 🤖 The `JavaSparkContext` created in `main()` is never stopped. 
If `run()` throws an exception, the SparkContext leaks. Wrap in try-finally:
   
   ```java
   public static void main(String[] args) throws IOException {
     final Config cfg = new Config();
     new JCommander(cfg, null, args);
     LOG.info("Cfg received: {}", cfg);
     JavaSparkContext jsc = UtilHelpers.buildSparkContext("HudiHiveSyncJob", 
"local[*]", true);
     try {
       new HudiHiveSyncJob(jsc, cfg).run();
     } finally {
       jsc.stop();
     }
   }
   ```
   
   (The test correctly handles this with try/finally — `main()` should do the 
same.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to