Re: [I] [Spark] parquet ingestion to azure gets stuck when hadoop fs cache is disabled [iceberg]

via GitHub Fri, 05 Jun 2026 11:03:53 -0700


palladium-coder commented on issue #16640:
URL: https://github.com/apache/iceberg/issues/16640#issuecomment-4634178837


   Hello,
   
   @steveloughran Thanks for the suggestions. We are trying them out.
   
   Digging deeper, I do see that there is scope for improvement on the iceberg 
side .
   
   For e.g in 
[HadoopOutputFile](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/hadoop/HadoopOutputFile.java#L57)
 we allow the user to provide there own FileSytem but in 
[ParquetIO](https://github.com/apache/iceberg/blob/main/parquet/src/main/java/org/apache/iceberg/parquet/ParquetIO.java#L70)
 it is never utilized it and instead creates its own in [another 
library](https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopOutputFile.java#L58).
 Such special handling of HadoopOutputFile isn't known to the users of iceberg 
library and was surfaced when we disable fileSystem cache. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [Spark] parquet ingestion to azure gets stuck when hadoop fs cache is disabled [iceberg]

Reply via email to