elliotsteene-swap opened a new issue, #2236:
URL: https://github.com/apache/iceberg-rust/issues/2236
## Description
The Python bindings (`pyiceberg-core`) do not support GCS-backed Iceberg
tables when used with the DataFusion table provider. The underlying
`iceberg-storage-opendal` crate already has full GCS support via the
`opendal-gcs` feature flag, and the `OpenDalStorageFactory::Gcs` variant exists
in `crates/storage/opendal/src/lib.rs` — but it is not wired up in the Python
bindings.
This means anyone using pyiceberg + DataFusion with tables stored on `gs://`
hits a runtime error:
```
RuntimeError: Unsupported storage scheme: gs
```
## Steps to Reproduce
```python
from pyiceberg.catalog import load_catalog
from datafusion import SessionContext
catalog = load_catalog("my_catalog") # REST catalog pointing to GCS-backed
warehouse
table = catalog.load_table("my_namespace.my_table")
ctx = SessionContext()
ctx.register_table("my_table", table) # <-- fails here
```
Error:
```
RuntimeError: Unsupported storage scheme: gs
```
The call path is:
1. `ctx.register_table()` calls `table.__datafusion_table_provider__()`
2. pyiceberg constructs `IcebergDataFusionTable` and calls its
`__datafusion_table_provider__()`
3. Rust-side `storage_factory_from_path()` in
`bindings/python/src/datafusion_table_provider.rs` does not match `gs` or `gcs`
schemes
## Proposed Changes
### 1. Enable `opendal-gcs` feature in `bindings/python/Cargo.toml`
```diff
- iceberg-storage-opendal = { path = "../../crates/storage/opendal",
features = ["opendal-s3", "opendal-fs", "opendal-memory"] }
+ iceberg-storage-opendal = { path = "../../crates/storage/opendal",
features = ["opendal-s3", "opendal-fs", "opendal-memory", "opendal-gcs"] }
```
### 2. Add `gs`/`gcs` match arms in
`bindings/python/src/datafusion_table_provider.rs`
In `storage_factory_from_path()`:
```diff
let factory: Arc<dyn StorageFactory> = match scheme {
"file" | "" => Arc::new(OpenDalStorageFactory::Fs),
"s3" | "s3a" => Arc::new(OpenDalStorageFactory::S3 {
configured_scheme: scheme.to_string(),
customized_credential_load: None,
}),
"memory" => Arc::new(OpenDalStorageFactory::Memory),
+ "gs" | "gcs" => Arc::new(OpenDalStorageFactory::Gcs),
_ => {
return Err(PyRuntimeError::new_err(format!(
"Unsupported storage scheme: {scheme}"
)));
}
};
```
## Context
- The `OpenDalStorageFactory::Gcs` variant and `gcs_config_parse()` already
exist in `crates/storage/opendal/src/lib.rs`
- The `opendal-gcs` feature flag is defined in
`crates/storage/opendal/Cargo.toml` and is included in `opendal-all`
- GCS storage support was added to iceberg-rust in #520, with
`gs://`/`gcs://` scheme support in #845 and OAuth support in #654
- This is a common use case for anyone using Google BigLake Metastore with
pyiceberg and DataFusion
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]