[GitHub] [hudi] yihua opened a new pull request, #7790: [HUDI-5632] Fix failure launching Spark jobs from `hudi-cli-bundle`

via GitHub Sun, 29 Jan 2023 22:45:54 -0800


yihua opened a new pull request, #7790:
URL: https://github.com/apache/hudi/pull/7790


   ### Change Logs
   
   The Hudi CLI commands which require launching Spark cannot be executed in 
Hudi CLI shell with hudi-cli-bundle:
   ```
   savepoint create --commit <latest-commit-timestamp> --sparkMaster local
   savepoint delete --commit <latest-commit-timestamp> --sparkMaster local
   savepoint create --commit <latest-commit-timestamp> --sparkMaster local
   downgrade table --toVersion 3 --sparkMaster local
   upgrade table --toVersion 5 --sparkMaster local
   compaction schedule --hoodieConfigs hoodie.compact.inline.max.delta.commits=1
   ```
   Sample error message:
   ```
   30977 [Thread-4] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 
Error: Failed to load org.apache.hudi.cli.commands.SparkMain: 
org/apache/hudi/common/engine/HoodieEngineContext
   ```
   
   The root cause is that the `hudi-cli-bundle` excludes the classes already in 
`hudi-spark*-bundle`, such as in `hudi-common` module, and the 
`hudi-spark*-bundle` is not added to the Spark launcher, so that the Spark job 
fails due to class not found.
   
   This PR fixes the problem by adding the `hudi-spark*-bundle` specified by 
env variable `SPARK_BUNDLE_JAR` to the Spark launcher.  Note that 
`SPARK_BUNDLE_JAR` is required when using `hudi-cli-bundle`.
   
   ### Impact
   
   Ensures that Hudi CLI commands which require launching Spark can be executed 
with `hudi-cli-bundle`.  The above CLI commands are tested to be working 
locally when this fix.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] yihua opened a new pull request, #7790: [HUDI-5632] Fix failure launching Spark jobs from `hudi-cli-bundle`

Reply via email to