martinseifertprojuventute opened a new issue, #1606:
URL: https://github.com/apache/iceberg-python/issues/1606

   ### Apache Iceberg version
   
   0.8.1 (latest release)
   
   ### Please describe the bug 🐞
   
   I have an external volume in Snowflake pointing to an Azure ADLS2:
   
   ```
   create or replace external volume ev_iceberg_tables
   storage_locations =
       ((
           name = 'iceberg_snowflake_managed'
           storage_provider = 'AZURE'
           storage_base_url = 
'azure://[storage_account].blob.core.windows.net/catalog/snowflake_managed/'
           azure_tenant_id = '[tenant]'
       ))
   ;
   ```
   
   So the container is called “catalog” and the Open Catalog I want to point to 
is called “snowflake_managed”. Then this is my catalog integration:
   
   ```
   create or replace catalog integration i_iceberg_catalog
   catalog_source = polaris
   table_format = iceberg
   catalog_namespace= 'default'
   rest_config = (
       catalog_uri = 
'https://[locator].snowflakecomputing.com/polaris/api/catalog'
       warehouse = 'snowflake_managed'
   )
   rest_authentication = (
       type = oauth
       oauth_client_id = '[client_id]'
       oauth_client_secret = '[client_secret]'
       oauth_allowed_scopes = ( 'PRINCIPAL_ROLE:ALL' )
   )
   enabled = true
   ;
   ```
   
   With this I create a table in the catalog:
   
   ```
   create or replace iceberg table iceberg.jira.roadmap (
       id int
       , [...]
   )
   external_volume = 'ev_iceberg_tables'
   catalog = 'SNOWFLAKE'
   base_location = 'jira/roadmap/'
   catalog_sync = 'i_iceberg_catalog'
   ;
   ```
   
   This creates the table in Open Catalog and I can populate the table just 
fine. But when I try to read from the table using pyIceberg or polars, this 
error is returned:
   
   > ValueError: No registered filesystem for scheme: wasbs
   
   So I checked the table's metadata:
   
   ```
   from pyiceberg.catalog import load_catalog
   from pyiceberg.io.fsspec import FsspecFileIO
   
   catalog = load_catalog(
       **{
           "type": "rest",
           "header.X-Iceberg-Access-Delegation": "vended-credentials",
           "uri": 
f"https://[locator].snowflakecomputing.com/polaris/api/catalog";,
           "credential": 
f"[open_catalog_client_id]:[open_catalog_client_secret]",
           "scope": "PRINCIPAL_ROLE:pyIceberg",
           "warehouse": "snowflake_managed",
           "token-refresh-enabled": "true",
           "py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO",
       }
   )
   
   table = catalog.load_table('ICEBERG.JIRA.ROADMAP')
   
   table.metadata
   ```
   
   > 
TableMetadataV2(location=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap’,
 table_uuid=UUID(‘35b[…]’), last_updated_ms=1738578925967, last_column_id=19, 
schemas=[Schema(NestedField(field_id=1, name=‘ID’, […], schema_id=0, 
identifier_field_ids=)], current_schema_id=0, 
partition_specs=[PartitionSpec(spec_id=0)], default_spec_id=0, 
last_partition_id=999, properties={‘format-version’: ‘2’}, 
current_snapshot_id=78408874928435018, 
snapshots=[Snapshot(snapshot_id=3032990014606473543, parent_snapshot_id=None, 
sequence_number=1, timestamp_ms=1738578919582, 
manifest_list=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/snap-1738578919582000000-5714c4a4-11e8-4c0a-b89b-cab4ea909f97.avro’,
 summary=None, schema_id=0), Snapshot(snapshot_id=78408874928435018, 
parent_snapshot_id=None, sequence_number=2, timestamp_ms=1738578925967, 
manifest_list=‘wasbs://catalog@[storage_account].b
 
lob.core.windows.net/snowflake_managed/jira/roadmap/metadata/snap-1738578925967000000-fbf8b14b-e0ba-4bf5-bfde-5c6cf88251ad.avro’,
 summary=Summary(Operation.APPEND, **{‘manifests-kept’: ‘0’, 
‘added-files-size’: ‘112128’, ‘total-records’: ‘708’, ‘manifests-created’: ‘1’, 
‘total-data-files’: ‘8’, ‘manifests-replaced’: ‘0’, ‘added-data-files’: ‘8’, 
‘added-records’: ‘708’, ‘total-files-size’: ‘112128’}), schema_id=0)], 
snapshot_log=[SnapshotLogEntry(snapshot_id=3032990014606473543, 
timestamp_ms=1738578919582), SnapshotLogEntry(snapshot_id=78408874928435018, 
timestamp_ms=1738578925967)], metadata_log=, 
sort_orders=[SortOrder(order_id=0)], default_sort_order_id=0, refs={‘main’: 
SnapshotRef(snapshot_id=78408874928435018, 
snapshot_ref_type=SnapshotRefType.BRANCH, min_snapshots_to_keep=None, 
max_snapshot_age_ms=None, max_ref_age_ms=None)}, format_version=2, 
last_sequence_number=2)
   
   Apparently the wasbs scheme was written into the metadata by either Open 
Catalog or Snowflake, even though the file is actually located in abfss:
   
   `table.metadata_location`
   
   > 
abfss://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/[…].metadata.json
   
   There obviously is a discrepancy between `table.metadata` and 
`table.metadata_location` - and I can't `table.scan().to_arrow()` the table as 
a result
   
   
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [x] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to