wgtmac commented on code in PR #3269:
URL: https://github.com/apache/parquet-java/pull/3269#discussion_r2302645726


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java:
##########
@@ -1804,14 +1804,27 @@ private static void copy(SeekableInputStream from, 
PositionOutputStream to, long
    * @throws IOException if there is an error while writing
    */
   public void end(Map<String, String> extraMetaData) throws IOException {
+    final long footerStart = out.getPos();
+
+    // Build the footer metadata) in memory using the helper stream
+    InMemoryPositionOutputStream buffer = new 
InMemoryPositionOutputStream(footerStart);

Review Comment:
   > The goal of this change was to reduce the likelihood of partial data 
written to disk during the footer write
   
   No, the original issue was proposed to deal with the error state after a 
failed `end` call. Concrete output stream implementations should already use an 
internal buffer as you did in this PR (or applications are also easy to wrap 
the stream with a buffered output stream by themselves) so it is not our 
responsibility to over-optimize it.
   
   I think the problem is to figure out how downstream projects deal with a 
failed `ParquetFileWriter.end()`. IMHO, they should simply call `close` instead 
of `end` again in case of a failed call.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to