rdblue commented on code in PR #5588:
URL: https://github.com/apache/iceberg/pull/5588#discussion_r950586786
##########
python/pyiceberg/io/__init__.py:
##########
@@ -218,11 +234,53 @@ def delete(self, location: Union[str, InputFile,
OutputFile]) -> None:
"""
-def load_file_io(_: Properties) -> FileIO:
- # To be implemented in a different PR.
- # - If py-file-io is present, load the right Python class
- # - When the property is missing, map from Java's filo-io to an
appropriate FileIO
- # - Extend the FileIO structure with a initialize that pass in properties
(could also be the constructor?)
+ARROW_FILE_IO = "pyiceberg.io.pyarrow.PyArrowFileIO"
+
+# Mappings from the Java FileIO impl to a Python one. The list is ordered by
preference.
+# If a implementation isn't installed, it will fall back to the next one.
+JAVA_FILE_IO_MAPPINGS: Dict[str, List[str]] = {
Review Comment:
I'm not sure that a mapping from Java implementation to Python
implementation makes sense.
The Java class is needed because this is dynamically loaded in Java. There
is a similar need here, but there's not a connection between the preferred Java
implementation and one in Python. For example, both S3 and GCS have direct
implementations (`S3FileIO`, `GCSFileIO`) and can be used through
`HadoopFileIO`. If you prefer `HadoopFileIO` in Java, that doesn't necessarily
mean that you prefer `ArrowFileIO` vs a future `FSSpecFileIO` in Python. This
choice is probably independent.
I think a better option is to ignore the `io-impl` property from Java and
look at the catalog's warehouse location or a table's location. The scheme from
either location should tell us what the backing store should be. Then we can
use a list of implementations from that scheme.
So I think this should probably be:
```python
FILE_IO_MAPPINGS: Dict[str, List[str]] = {
"s3": [ARROW_FILE_IO],
"gcs": [ARROW_FILE_IO],
"file": [ARROW_FILE_IO],
"hdfs": [ARROW_FILE_IO],
...
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]