LuciferYang opened a new pull request, #3450:
URL: https://github.com/apache/parquet-java/pull/3450

   ### Rationale for this change
   Fixes #3011
   
   After a write error (e.g. OOM during page flush), 
`InternalParquetRecordWriter` sets its `aborted` flag to true and re-throws the 
exception. However, subsequent calls to `write()` are silently accepted without 
checking this flag. Since `close()` skips flushing when `aborted` is true, all 
data written after the error is silently discarded, producing a corrupted 
Parquet file without a footer. Users only discover the corruption when they 
attempt to read the file later.
   
   
   ### What changes are included in this PR?
   Added an `aborted` state check at the beginning of `write()`. If the writer 
has been aborted due to a previous error, an `IOException` is thrown 
immediately with a clear error message, preventing further writes to a writer 
in an undefined state.
   
   ### Are these changes tested?
   Yes. Added `testWriteAfterAbortShouldThrow` in `TestParquetWriterError` that 
verifies:
   
   1. Writing to an aborted writer throws `IOException` with the expected 
message
   2. `close()` on an aborted writer completes without throwing
   
   All existing tests in `parquet-hadoop` pass without modification.
   
   ### Are there any user-facing changes?
   Yes. Users who previously caught write exceptions and continued writing to 
the same `ParquetWriter` will now receive an `IOException` on subsequent write 
attempts. This is an intentional change to prevent silent data loss — the 
correct behavior after a write failure is to discard the writer and create a 
new one.
   
   Closes #3011
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to