[GitHub] [iceberg] Fokko commented on a diff in pull request #8549: Python: Fix caching of FileSystem

via GitHub Tue, 12 Sep 2023 01:38:28 -0700


Fokko commented on code in PR #8549:
URL: https://github.com/apache/iceberg/pull/8549#discussion_r1322658845



##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -290,7 +290,7 @@ def to_input_file(self) -> PyArrowFile:
 
 class PyArrowFileIO(FileIO):
     def __init__(self, properties: Properties = EMPTY_DICT):
-        self.get_fs: Callable[[str], FileSystem] = lru_cache(self._get_fs)
+        self.get_fs: Callable[[str], FileSystem] = 
lru_cache(self._initialize_fs)

Review Comment:
   I'm almost inclined to remove the `lru_cache` in general and make a `fs` 
property. We could use a single FileIO to connect to both S3 and GCS, but I'm 
not sure if that happens in practice, or if we should allow it. A FileIO is 
bound to a table, so this would mean that a table is distributed across 
different object stores 🤔 



##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -290,7 +290,7 @@ def to_input_file(self) -> PyArrowFile:
 
 class PyArrowFileIO(FileIO):
     def __init__(self, properties: Properties = EMPTY_DICT):
-        self.get_fs: Callable[[str], FileSystem] = lru_cache(self._get_fs)
+        self.get_fs: Callable[[str], FileSystem] = 
lru_cache(self._initialize_fs)

Review Comment:
   I'm almost inclined to remove the `lru_cache` in general and make a `fs` 
property. We could use a single FileIO to connect to both S3 and GCS, but I'm 
not sure if that happens in practice, or if we should allow it. A FileIO is 
bound to a table, so this would mean that a table is distributed across 
different object stores 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] Fokko commented on a diff in pull request #8549: Python: Fix caching of FileSystem

Reply via email to