steveloughran commented on PR #16641: URL: https://github.com/apache/iceberg/pull/16641#issuecomment-4595573869
This isn't needed 1. it is fixed on Hadoop 3.4.0 at the level it should be: in the abfs output stream 2. it's only going to surface on spark 3, which on 1.11.0 means 3.5 3. and then only when fs caching is disabled Disabling abfs caching is a performance killer as you lose the thread pool, the http connection pool and the http prefetch buffer pool. Unless you have some fundamental reason, fix that. ...but it's not needed in iceberg main branch once spark 3.5 is cut; it's a very transient workaround for the situation "caching disabled". Once the spark-3.5 branch goes this PR just becomes superfluous and should really be rolled back to keep the codebase leaner. Can't you just enable caching and speed all your work up at the same time? or, if there is some abfs issue which requires caching to be turned off, why not discuss it here or creating a HADOOP- jira on fs/azure. Though as usual: test against 3.4.3/3.5.0 before reporting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
