rdblue opened a new pull request, #5683:
URL: https://github.com/apache/iceberg/pull/5683

   This is an alternative fix for the Flink double close problem, #4168 and 
#5310.
   
   The original solution modified `S3OutputStream` so that every call to 
`close` after a failure would re-throw the original exception. That violates 
the [contract for 
`close`](https://docs.oracle.com/javase/8/docs/api/java/io/Closeable.html#close--),
 which states:
   
   > Closes this stream and releases any system resources associated with it. 
**If the stream is already closed then invoking this method has no effect.**
   
   The original fix also did not address the underlying problem that the stream 
was closed twice and still emitting data files. This PR fixes double close 
cases in `BaseTaskWriter` by ensuring the writer that is closed is always set 
to null.
   
   This PR also attempts to fix the problem of duplicate data in Flink by 
throwing an `IllegalStateException` if a call to `close` fails but `complete` 
is still called on the `BaseTaskWriter`. The write result from `complete` is 
the only way to emit data files, so this ensures that no data files will be 
emitted from a writer that failed while closing a file. This is a precaution 
because it isn't clear why failed Flink writers were still emitting data files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to