Re: [PR] Parquet: Keep FileSystem reachable during writes when Hadoop FS cache is disabled [iceberg]

via GitHub Mon, 01 Jun 2026 11:57:43 -0700


steveloughran commented on PR #16641:
URL: https://github.com/apache/iceberg/pull/16641#issuecomment-4595573869


   This isn't needed
   
   1. it is fixed on Hadoop 3.4.0 at the level it should be: in the abfs output 
stream
   2. it's only going to surface on spark 3, which on 1.11.0 means 3.5
   3. and then only when fs caching is disabled
   
   
   Disabling abfs caching is a performance killer as you lose the thread pool, 
the http connection pool and the http prefetch buffer pool. Unless you have 
some fundamental reason, fix that.
   
   ...but it's not needed in iceberg main branch once spark 3.5 is cut; it's a 
very transient workaround for the situation "caching disabled". Once the 
spark-3.5 branch goes this PR just becomes superfluous and should really be 
rolled back to keep the codebase leaner.
   
   Can't you just enable caching and speed all your work up at the same time?
   
   or, if there is some abfs issue which requires caching to be turned off, why 
not discuss it here or creating a HADOOP- jira on fs/azure. Though as usual: 
test against 3.4.3/3.5.0 before reporting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Parquet: Keep FileSystem reachable during writes when Hadoop FS cache is disabled [iceberg]

Reply via email to