rdblue opened a new pull request, #5683: URL: https://github.com/apache/iceberg/pull/5683
This is an alternative fix for the Flink double close problem, #4168 and #5310. The original solution modified `S3OutputStream` so that every call to `close` after a failure would re-throw the original exception. That violates the [contract for `close`](https://docs.oracle.com/javase/8/docs/api/java/io/Closeable.html#close--), which states: > Closes this stream and releases any system resources associated with it. **If the stream is already closed then invoking this method has no effect.** The original fix also did not address the underlying problem that the stream was closed twice and still emitting data files. This PR fixes double close cases in `BaseTaskWriter` by ensuring the writer that is closed is always set to null. This PR also attempts to fix the problem of duplicate data in Flink by throwing an `IllegalStateException` if a call to `close` fails but `complete` is still called on the `BaseTaskWriter`. The write result from `complete` is the only way to emit data files, so this ensures that no data files will be emitted from a writer that failed while closing a file. This is a precaution because it isn't clear why failed Flink writers were still emitting data files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
