prashantwason opened a new pull request, #18084: URL: https://github.com/apache/hudi/pull/18084
### Describe the issue this Pull Request addresses This PR adds the Spark application ID (`spark_application_id`) to the commit metadata of all completed instants, enabling users to trace which Spark application performed operations on a Hudi dataset. Closes #539 ### Summary and Changelog **Summary:** Adds engine-specific metadata (spark_application_id) to all commit types including writes, compaction, clustering, and cleaning operations. **Changelog:** - Added `getEngineCommitMetadata()` method to `HoodieEngineContext` base class that returns engine-specific metadata as a Map - Implemented `getEngineCommitMetadata()` in `HoodieSparkEngineContext` to return `spark_application_id` - Updated `SparkRDDWriteClient` to merge engine metadata with extra metadata during commits - Updated `BaseSparkCommitActionExecutor` to add engine metadata to write commits - Updated `BaseCommitActionExecutor` to add engine metadata to clustering commits - Updated `RunCompactionActionExecutor` to add engine metadata to compaction/log compaction commits - Updated `CleanActionExecutor` to add engine metadata to clean commits ### Impact - **Public API:** Added `getEngineCommitMetadata()` method to `HoodieEngineContext` which can be overridden by engine implementations - **User-facing:** The `spark_application_id` field will now appear in commit metadata for all operations (writes, compaction, clustering, cleaning) - **Extensibility:** Other engines (e.g., Flink) can implement `getEngineCommitMetadata()` to add their own identifiers ### Risk Level low - This is an additive change that only adds optional metadata to commits. No existing behavior is modified. ### Documentation Update none - This is an internal metadata enhancement that does not require documentation updates. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [ ] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
