Fokko commented on code in PR #5747:
URL: https://github.com/apache/iceberg/pull/5747#discussion_r968935739


##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -164,7 +163,27 @@ def to_input_file(self) -> "PyArrowFile":
         return self
 
 
+
 class PyArrowFileIO(FileIO):
+
+    def __init__(self, properties: Properties = {}):
+        self.get_fs_and_path: Callable = lru_cache(self._get_fs_and_path)
+        super().__init__(properties=properties)
+
+    def _get_fs_and_path(self, location: str) -> (FileSystem, str):

Review Comment:
   Probably you want to use a `Tuple` here:
   ```suggestion
       def _get_fs_and_path(self, location: str) -> Tuple[FileSystem, str]:
   ```



##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -164,7 +163,27 @@ def to_input_file(self) -> "PyArrowFile":
         return self
 
 
+
 class PyArrowFileIO(FileIO):
+
+    def __init__(self, properties: Properties = {}):

Review Comment:
   In order to avoid using a mutable type here, you probably want to use 
`EMPTY_DICT` from `from pyiceberg.typedef import EMPTY_DICT`
   ```suggestion
       def __init__(self, properties: Properties = EMPTY_DICT):
   ```



##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -35,13 +36,14 @@
     OutputFile,
     OutputStream,
 )
+from pyiceberg.typedef import Properties
 
 
 class PyArrowFile(InputFile, OutputFile):
     """A combined InputFile and OutputFile implementation that uses a pyarrow 
filesystem to generate pyarrow.lib.NativeFile instances
 
     Args:
-        location(str): A URI or a path to a local file

Review Comment:
   I don't think we should change this, because it would diverge from the 
parent class: 
https://github.com/apache/iceberg/blob/master/python/pyiceberg/io/__init__.py#L113-L114



##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -164,7 +163,27 @@ def to_input_file(self) -> "PyArrowFile":
         return self
 
 
+
 class PyArrowFileIO(FileIO):
+
+    def __init__(self, properties: Properties = {}):
+        self.get_fs_and_path: Callable = lru_cache(self._get_fs_and_path)
+        super().__init__(properties=properties)
+
+    def _get_fs_and_path(self, location: str) -> (FileSystem, str):

Review Comment:
   Instead of returning the path, couldn't we use the original location and 
just return the filesystem?



##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -164,7 +163,27 @@ def to_input_file(self) -> "PyArrowFile":
         return self
 
 
+
 class PyArrowFileIO(FileIO):
+
+    def __init__(self, properties: Properties = {}):
+        self.get_fs_and_path: Callable = lru_cache(self._get_fs_and_path)
+        super().__init__(properties=properties)
+
+    def _get_fs_and_path(self, location: str) -> (FileSystem, str):
+        uri = urlparse(location)  # Create a ParseResult from the URI
+        if not uri.scheme:  # If no scheme, assume the path is to a local file
+            return FileSystem.from_uri(os.path.abspath(location))
+        elif uri.scheme in ['s3', 's3a', 's3n']:
+            client_kwargs = {
+                "endpoint_override": self.properties.get("s3.endpoint"),
+                "access_key": self.properties.get("s3.access-key-id"),
+                "secret_key": self.properties.get("s3.secret-access-key"),
+            }
+            return (S3FileSystem(**client_kwargs), uri.netloc + uri.path)
+        else:
+            return FileSystem.from_uri(location)  # Infer the proper filesystem

Review Comment:
   Here we don't return a tuple.



##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -164,7 +163,27 @@ def to_input_file(self) -> "PyArrowFile":
         return self
 
 
+
 class PyArrowFileIO(FileIO):
+
+    def __init__(self, properties: Properties = {}):
+        self.get_fs_and_path: Callable = lru_cache(self._get_fs_and_path)
+        super().__init__(properties=properties)
+
+    def _get_fs_and_path(self, location: str) -> (FileSystem, str):
+        uri = urlparse(location)  # Create a ParseResult from the URI
+        if not uri.scheme:  # If no scheme, assume the path is to a local file
+            return FileSystem.from_uri(os.path.abspath(location))
+        elif uri.scheme in ['s3', 's3a', 's3n']:

Review Comment:
   Nit:
   ```suggestion
           elif uri.scheme in {'s3', 's3a', 's3n'}:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to