Fokko commented on code in PR #5588:
URL: https://github.com/apache/iceberg/pull/5588#discussion_r950887942
##########
python/pyiceberg/io/__init__.py:
##########
@@ -218,11 +234,53 @@ def delete(self, location: Union[str, InputFile,
OutputFile]) -> None:
"""
-def load_file_io(_: Properties) -> FileIO:
- # To be implemented in a different PR.
- # - If py-file-io is present, load the right Python class
- # - When the property is missing, map from Java's filo-io to an
appropriate FileIO
- # - Extend the FileIO structure with a initialize that pass in properties
(could also be the constructor?)
+ARROW_FILE_IO = "pyiceberg.io.pyarrow.PyArrowFileIO"
+
+# Mappings from the Java FileIO impl to a Python one. The list is ordered by
preference.
+# If a implementation isn't installed, it will fall back to the next one.
+JAVA_FILE_IO_MAPPINGS: Dict[str, List[str]] = {
+ "org.apache.iceberg.dell.ecs.EcsFileIO": [ARROW_FILE_IO],
+ "org.apache.iceberg.gcp.gcs.GCSFileIO": [ARROW_FILE_IO],
+ "org.apache.iceberg.hadoop.HadoopFileIO": [ARROW_FILE_IO],
+ "org.apache.iceberg.aliyun.oss.OSSFileIO": [ARROW_FILE_IO],
+ "org.apache.iceberg.io.ResolvingFileIO": [ARROW_FILE_IO],
+ "org.apache.iceberg.aws.s3.S3FileIO": [ARROW_FILE_IO],
Review Comment:
The `load_` methods were created for loading the catalog in a lazy manner.
We could create a method like `pyiceberg.io.pyarrow:load_file_io`, but I don't
see what benefit that brings, at the cost of additional complexity.
Something like:
```
# Default to PyArrow
from pyiceberg.io.pyarrow import load_pyarrow()
return load_pyarrow()
```
But then we still need to have a fallback for FileIOs that are outside of
the scope of the `pyiceberg` package. I like the current implementation because
it very specifically imports the FileIO that we're looking for, and we don't
import the class on the forehand, removing the possibility of importing
optional dependencies.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]