Fokko commented on code in PR #8549:
URL: https://github.com/apache/iceberg/pull/8549#discussion_r1322658845


##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -290,7 +290,7 @@ def to_input_file(self) -> PyArrowFile:
 
 class PyArrowFileIO(FileIO):
     def __init__(self, properties: Properties = EMPTY_DICT):
-        self.get_fs: Callable[[str], FileSystem] = lru_cache(self._get_fs)
+        self.get_fs: Callable[[str], FileSystem] = 
lru_cache(self._initialize_fs)

Review Comment:
   I'm almost inclined to remove the `lru_cache` in general and make a `fs` 
property. We could use a single FileIO to connect to both S3 and GCS, but I'm 
not sure if that happens in practice, or if we should allow it. A FileIO is 
bound to a table, so this would mean that a table is distributed across 
different object stores 🤔 



##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -290,7 +290,7 @@ def to_input_file(self) -> PyArrowFile:
 
 class PyArrowFileIO(FileIO):
     def __init__(self, properties: Properties = EMPTY_DICT):
-        self.get_fs: Callable[[str], FileSystem] = lru_cache(self._get_fs)
+        self.get_fs: Callable[[str], FileSystem] = 
lru_cache(self._initialize_fs)

Review Comment:
   I'm almost inclined to remove the `lru_cache` in general and make a `fs` 
property. We could use a single FileIO to connect to both S3 and GCS, but I'm 
not sure if that happens in practice, or if we should allow it. A FileIO is 
bound to a table, so this would mean that a table is distributed across 
different object stores 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to