elliotsteene-swap opened a new issue, #2236:
URL: https://github.com/apache/iceberg-rust/issues/2236

   ## Description
   
   The Python bindings (`pyiceberg-core`) do not support GCS-backed Iceberg 
tables when used with the DataFusion table provider. The underlying 
`iceberg-storage-opendal` crate already has full GCS support via the 
`opendal-gcs` feature flag, and the `OpenDalStorageFactory::Gcs` variant exists 
in `crates/storage/opendal/src/lib.rs` — but it is not wired up in the Python 
bindings.
   
   This means anyone using pyiceberg + DataFusion with tables stored on `gs://` 
hits a runtime error:
   
   ```
   RuntimeError: Unsupported storage scheme: gs
   ```
   
   ## Steps to Reproduce
   
   ```python
   from pyiceberg.catalog import load_catalog
   from datafusion import SessionContext
   
   catalog = load_catalog("my_catalog")  # REST catalog pointing to GCS-backed 
warehouse
   table = catalog.load_table("my_namespace.my_table")
   
   ctx = SessionContext()
   ctx.register_table("my_table", table)  # <-- fails here
   ```
   
   Error:
   
   ```
   RuntimeError: Unsupported storage scheme: gs
   ```
   
   The call path is:
   1. `ctx.register_table()` calls `table.__datafusion_table_provider__()`
   2. pyiceberg constructs `IcebergDataFusionTable` and calls its 
`__datafusion_table_provider__()`
   3. Rust-side `storage_factory_from_path()` in 
`bindings/python/src/datafusion_table_provider.rs` does not match `gs` or `gcs` 
schemes
   
   ## Proposed Changes
   
   ### 1. Enable `opendal-gcs` feature in `bindings/python/Cargo.toml`
   
   ```diff
   - iceberg-storage-opendal = { path = "../../crates/storage/opendal", 
features = ["opendal-s3", "opendal-fs", "opendal-memory"] }
   + iceberg-storage-opendal = { path = "../../crates/storage/opendal", 
features = ["opendal-s3", "opendal-fs", "opendal-memory", "opendal-gcs"] }
   ```
   
   ### 2. Add `gs`/`gcs` match arms in 
`bindings/python/src/datafusion_table_provider.rs`
   
   In `storage_factory_from_path()`:
   
   ```diff
     let factory: Arc<dyn StorageFactory> = match scheme {
         "file" | "" => Arc::new(OpenDalStorageFactory::Fs),
         "s3" | "s3a" => Arc::new(OpenDalStorageFactory::S3 {
             configured_scheme: scheme.to_string(),
             customized_credential_load: None,
         }),
         "memory" => Arc::new(OpenDalStorageFactory::Memory),
   +     "gs" | "gcs" => Arc::new(OpenDalStorageFactory::Gcs),
         _ => {
             return Err(PyRuntimeError::new_err(format!(
                 "Unsupported storage scheme: {scheme}"
             )));
         }
     };
   ```
   
   ## Context
   
   - The `OpenDalStorageFactory::Gcs` variant and `gcs_config_parse()` already 
exist in `crates/storage/opendal/src/lib.rs`
   - The `opendal-gcs` feature flag is defined in 
`crates/storage/opendal/Cargo.toml` and is included in `opendal-all`
   - GCS storage support was added to iceberg-rust in #520, with 
`gs://`/`gcs://` scheme support in #845 and OAuth support in #654
   - This is a common use case for anyone using Google BigLake Metastore with 
pyiceberg and DataFusion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to