suryaprasanna opened a new pull request, #17862:
URL: https://github.com/apache/hudi/pull/17862
### Describe the issue this Pull Request addresses
This PR fixes an issue in Hudi CLI where attempting to create multiple
`JavaSparkContext` instances would fail with "Only one SparkContext should be
running in this JVM" error. The problem occurred because the code directly
instantiated a new `JavaSparkContext` without checking if one already existed.
Since, it is a public method and if engineers create custom hudi-cli commands
and they access this method it can throw "Only one SparkContext should be
running" error.
The fix uses `SparkSession.builder().getOrCreate()` which properly handles
existing contexts, and then obtains the `JavaSparkContext` from the session.
### Summary and Changelog
Users can now run multiple Hudi CLI commands that require Spark without
encountering SparkContext initialization errors.
**Changes:**
- Modified `SparkUtil.initJavaSparkContext()` to use
`SparkSession.builder().getOrCreate()` instead of directly creating
`JavaSparkContext`
- Added `SparkSession` import
- Changed from `new JavaSparkContext(sparkConf)` to
`JavaSparkContext.fromSparkContext(spark.sparkContext())`
### Impact
None - this change only affects context creation logic.
### Risk Level
**Low** - This change uses the recommended pattern for obtaining a
SparkContext via SparkSession, which is more robust than direct instantiation.
The `getOrCreate()` method ensures we reuse existing contexts when available
and only create new ones when necessary.
### Documentation Update
none
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]