lwfitzgerald opened a new pull request, #2495:
URL: https://github.com/apache/iceberg-python/pull/2495

   <!--
   Thanks for opening a pull request!
   -->
   
   <!-- In the case this PR will resolve an issue, please replace 
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
   <!-- Closes #${GITHUB_ISSUE_ID} -->
   
   # Rationale for this change
   
   `FsspecFileIO.get_fs` can be called by multiple threads when 
`ExecutorFactory` is used (for example by `DataScan.plan_files`).
   
   The base class of `fsspec` filesystem objects, 
`fsspec.spec.AbstractFileSystem`, internally caches instances through the 
`fsspec.spec._Cached` metaclass. The caching key used includes 
`threading.get_ident()`, making entries thread-local:
   
https://github.com/fsspec/filesystem_spec/blame/f84b99f0d1f079f990db1a219b74df66ab3e7160/fsspec/spec.py#L71
   
   The `FsspecFileIO.get_fs` LRU cache (around `FsspecFileIO._get_fs`) breaks 
the thread-locality of the filesystem instances as it will return the same 
instance for different threads.
   
   One consequence of this is that for `s3fs.S3FileSystem`, HTTP connection 
pooling no longer occurs per thread (as is normal with `aiobotocore`), as the 
`aiobotocore` client object (containing the `aiohttp.ClientSession`) is stored 
on the `s3fs.S3FileSystem`.
   
   This change addresses this by making the `FsspecFileIO.get_fs` cache 
thread-local.
   
   ## Are these changes tested?
   
   Tested locally.
   
   ## Are there any user-facing changes?
   
   Yes - S3 HTTP connection pooling now occurs per-thread, matching the normal 
behaviour of `aiobotocore`.
   
   <!-- In the case of user-facing changes, please add the changelog label. -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to