wgtmac commented on code in PR #3269:
URL: https://github.com/apache/parquet-java/pull/3269#discussion_r2302645726
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java:
##########
@@ -1804,14 +1804,27 @@ private static void copy(SeekableInputStream from,
PositionOutputStream to, long
* @throws IOException if there is an error while writing
*/
public void end(Map<String, String> extraMetaData) throws IOException {
+ final long footerStart = out.getPos();
+
+ // Build the footer metadata) in memory using the helper stream
+ InMemoryPositionOutputStream buffer = new
InMemoryPositionOutputStream(footerStart);
Review Comment:
> The goal of this change was to reduce the likelihood of partial data
written to disk during the footer write
No, the original issue was proposed to deal with the error state after a
failed `end` call. Concrete output stream implementations should already use an
internal buffer as you did in this PR (or applications are also easy to wrap
the stream with a buffered output stream by themselves) so it is not our
responsibility to over-optimize it.
I think the problem is to figure out how downstream projects deal with a
failed `ParquetFileWriter.end()`. IMHO, they should simply call `close` instead
of `end` again in case of a failed call.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]