prashantwason opened a new pull request, #18084:
URL: https://github.com/apache/hudi/pull/18084

   ### Describe the issue this Pull Request addresses
   
   This PR adds the Spark application ID (`spark_application_id`) to the commit 
metadata of all completed instants, enabling users to trace which Spark 
application performed operations on a Hudi dataset.
   
   Closes #539
   
   ### Summary and Changelog
   
   **Summary:** Adds engine-specific metadata (spark_application_id) to all 
commit types including writes, compaction, clustering, and cleaning operations.
   
   **Changelog:**
   - Added `getEngineCommitMetadata()` method to `HoodieEngineContext` base 
class that returns engine-specific metadata as a Map
   - Implemented `getEngineCommitMetadata()` in `HoodieSparkEngineContext` to 
return `spark_application_id`
   - Updated `SparkRDDWriteClient` to merge engine metadata with extra metadata 
during commits
   - Updated `BaseSparkCommitActionExecutor` to add engine metadata to write 
commits
   - Updated `BaseCommitActionExecutor` to add engine metadata to clustering 
commits
   - Updated `RunCompactionActionExecutor` to add engine metadata to 
compaction/log compaction commits
   - Updated `CleanActionExecutor` to add engine metadata to clean commits
   
   ### Impact
   
   - **Public API:** Added `getEngineCommitMetadata()` method to 
`HoodieEngineContext` which can be overridden by engine implementations
   - **User-facing:** The `spark_application_id` field will now appear in 
commit metadata for all operations (writes, compaction, clustering, cleaning)
   - **Extensibility:** Other engines (e.g., Flink) can implement 
`getEngineCommitMetadata()` to add their own identifiers
   
   ### Risk Level
   
   low - This is an additive change that only adds optional metadata to 
commits. No existing behavior is modified.
   
   ### Documentation Update
   
   none - This is an internal metadata enhancement that does not require 
documentation updates.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to