huyuanfeng2018 opened a new pull request, #7696:
URL: https://github.com/apache/iceberg/pull/7696

   When I used Flink to write to the iceberg table, I made a layer of 
encapsulation for the outputStream, which will eventually be converted into a 
HadoopPositionOutputStream object. It implements DelegatingOutputStream, and 
the HadoopPositionOutputStream object rewrites the finalize method. This method 
will try to close the stream, but there will be problems in parquetIO:
   
   ```
     static PositionOutputStream 
stream(org.apache.iceberg.io.PositionOutputStream stream) {
       if (stream instanceof DelegatingOutputStream) {
         OutputStream wrapped = ((DelegatingOutputStream) stream).getDelegate();
         if (wrapped instanceof FSDataOutputStream) {
           return HadoopStreams.wrap((FSDataOutputStream) wrapped);
         }
       }
       return new ParquetOutputStreamAdapter(stream);
     }
   ```
   Here, the Delegate will be taken out and repackaged, and the 
HadoopPositionOutputStream object will be discarded. At this time, the 
HadoopPositionOutputStream object may not be continuously referenced, and may 
be recycled during gc. If gc is triggered to recycle the 
HadoopPositionOutputStream object during the stream writing process, close will 
be called at this time. The method closes the stream, causing file corruption.
   
   This pr is used to solve this problem. Add takeDelegate to 
DelegatingOutputStream to take the Delegate out and set it to null to prevent 
finalize close
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to