danny0405 opened a new pull request, #18776:
URL: https://github.com/apache/hudi/pull/18776

   ```markdown
   ### Describe the issue this Pull Request addresses
   
   Several Hudi file-writing paths can allocate output streams or file writers 
before all initialization/write steps complete. If a later constructor step, 
write operation, flush, or close-time metadata operation throws before the 
normal close path is reached, the underlying writer or stream may remain open.
   
   This PR tightens cleanup for Parquet, HFile, binary-copy, Spark/Flink row 
writers, bootstrap index writers, and write handles so writer-owned resources 
are closed when failures occur. There are no storage format, public API, or 
config changes.
   
   ### Summary and Changelog
   
   This change makes file-writer cleanup more exception-safe across production 
write paths and adds focused coverage for the binary-copy and create-handle 
failure cases.
   
   #### Working tree: close file writers on failure paths
   - Close raw output streams if Parquet writer construction fails in 
`HoodieParquetStreamWriter`, `HoodieSparkParquetStreamWriter`, and 
`HoodieRowDataParquetOutputStreamWriter`.
   - Close `HoodieAvroHFileWriter` output stream only when writer construction 
does not take ownership.
   - Harden `HFileBootstrapIndexWriter.begin()` and `close()` so partially 
initialized HFile writers are closed and close failures are preserved.
   - Ensure `ParquetUtils.serializeRecordsToLogBlock` closes the 
`HoodieFileWriter` with try-with-resources.
   - Ensure `SparkHelpers` closes `HoodieAvroParquetWriter` in a `finally` 
block.
   - Harden `HoodieParquetBinaryCopyBase` by closing the Parquet writer when 
`start()` fails, clearing writer state after close, and closing column writers 
in `maskColumn` / `addNullColumn` when flush/write fails.
   - Add compatibility handling for `ParquetFileWriter.close()` via reflective 
method lookup only when the runtime class exposes `close()`.
   
   #### Working tree: close write-handle owned writers on failure paths
   - `BaseCreateHandle` now closes and clears `fileWriter` when record writing 
fails and write failures are not ignored.
   - `HoodieAppendHandle` now closes the log writer when record write, 
compaction write, or close-time append/flush fails.
   - `HoodieWriteMergeHandle` now closes and clears `fileWriter` when 
`writeIncomingRecords()` or close-time operations fail.
   - `HoodieSortedMergeHandle` now routes pending insert writes through 
`writeIncomingRecords()` so the base merge-handle close protection applies.
   - `HoodieBinaryCopyHandle` now closes the binary copier if `binaryCopy()` 
fails before the normal close path.
   
   #### Working tree: tests
   - Added `TestHoodieParquetBinaryCopyBaseSchemaEvolution` coverage for:
     - missing `ParquetFileWriter.close()` compatibility behavior,
     - invoking `close()` when present,
     - clearing the writer when `end()` fails.
   - Added `TestHoodieCreateHandle#testFileWriterClosedWhenDoWriteFails` to 
verify `fileWriter` is cleared after write failure.
   
   ### Impact
   
   No public API, config, or storage format changes. The impact is limited to 
safer resource cleanup in failure paths for file-writing and write-handle code. 
Successful write behavior should remain unchanged.
   
   Affected paths include Hudi create/append/merge/binary-copy handles, Parquet 
stream writers, HFile writers, bootstrap index writers, and Spark/Flink writer 
integrations.
   
   ### Risk Level
   
   medium
   
   This touches core write-path cleanup logic across multiple modules, 
including append, merge, create, binary-copy, Parquet, and HFile paths. The 
behavioral intent is narrow: close resources on exception paths and preserve 
original failures by adding close failures as suppressed where applicable.
   
   Validation performed:
   - `git diff --check` passes.
   - `mvn -pl hudi-hadoop-common -DskipITs -Dcheckstyle.skip -Dspotbugs.skip 
-Dtest=TestHoodieParquetBinaryCopyBaseSchemaEvolution test` passed with 11 
tests.
   - `mvn -pl hudi-hadoop-common,hudi-common -DskipTests -DskipITs 
-Dcheckstyle.skip -Dspotbugs.skip compile` passed.
   - `mvn -pl hudi-client/hudi-client-common -DskipITs -Dcheckstyle.skip 
-Dspotbugs.skip 
-Dtest=TestHoodieCreateHandle#testFileWriterClosedWhenDoWriteFails test` was 
attempted but blocked during compile by an unrelated existing error in 
`StreamingOffsetValidator.java:170` for missing 
`ValidationContext#getTotalWriteErrors()`.
   
   ### Documentation Update
   
   none
   
   This PR does not add or change user-facing configs, APIs, file formats, or 
documented behavior. It only improves cleanup behavior on internal failure 
paths.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to