martinseifertprojuventute opened a new issue, #1606:
URL: https://github.com/apache/iceberg-python/issues/1606
### Apache Iceberg version
0.8.1 (latest release)
### Please describe the bug 🐞
I have an external volume in Snowflake pointing to an Azure ADLS2:
```
create or replace external volume ev_iceberg_tables
storage_locations =
((
name = 'iceberg_snowflake_managed'
storage_provider = 'AZURE'
storage_base_url =
'azure://[storage_account].blob.core.windows.net/catalog/snowflake_managed/'
azure_tenant_id = '[tenant]'
))
;
```
So the container is called “catalog” and the Open Catalog I want to point to
is called “snowflake_managed”. Then this is my catalog integration:
```
create or replace catalog integration i_iceberg_catalog
catalog_source = polaris
table_format = iceberg
catalog_namespace= 'default'
rest_config = (
catalog_uri =
'https://[locator].snowflakecomputing.com/polaris/api/catalog'
warehouse = 'snowflake_managed'
)
rest_authentication = (
type = oauth
oauth_client_id = '[client_id]'
oauth_client_secret = '[client_secret]'
oauth_allowed_scopes = ( 'PRINCIPAL_ROLE:ALL' )
)
enabled = true
;
```
With this I create a table in the catalog:
```
create or replace iceberg table iceberg.jira.roadmap (
id int
, [...]
)
external_volume = 'ev_iceberg_tables'
catalog = 'SNOWFLAKE'
base_location = 'jira/roadmap/'
catalog_sync = 'i_iceberg_catalog'
;
```
This creates the table in Open Catalog and I can populate the table just
fine. But when I try to read from the table using pyIceberg or polars, this
error is returned:
> ValueError: No registered filesystem for scheme: wasbs
So I checked the table's metadata:
```
from pyiceberg.catalog import load_catalog
from pyiceberg.io.fsspec import FsspecFileIO
catalog = load_catalog(
**{
"type": "rest",
"header.X-Iceberg-Access-Delegation": "vended-credentials",
"uri":
f"https://[locator].snowflakecomputing.com/polaris/api/catalog",
"credential":
f"[open_catalog_client_id]:[open_catalog_client_secret]",
"scope": "PRINCIPAL_ROLE:pyIceberg",
"warehouse": "snowflake_managed",
"token-refresh-enabled": "true",
"py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO",
}
)
table = catalog.load_table('ICEBERG.JIRA.ROADMAP')
table.metadata
```
>
TableMetadataV2(location=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap’,
table_uuid=UUID(‘35b[…]’), last_updated_ms=1738578925967, last_column_id=19,
schemas=[Schema(NestedField(field_id=1, name=‘ID’, […], schema_id=0,
identifier_field_ids=)], current_schema_id=0,
partition_specs=[PartitionSpec(spec_id=0)], default_spec_id=0,
last_partition_id=999, properties={‘format-version’: ‘2’},
current_snapshot_id=78408874928435018,
snapshots=[Snapshot(snapshot_id=3032990014606473543, parent_snapshot_id=None,
sequence_number=1, timestamp_ms=1738578919582,
manifest_list=‘wasbs://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/snap-1738578919582000000-5714c4a4-11e8-4c0a-b89b-cab4ea909f97.avro’,
summary=None, schema_id=0), Snapshot(snapshot_id=78408874928435018,
parent_snapshot_id=None, sequence_number=2, timestamp_ms=1738578925967,
manifest_list=‘wasbs://catalog@[storage_account].b
lob.core.windows.net/snowflake_managed/jira/roadmap/metadata/snap-1738578925967000000-fbf8b14b-e0ba-4bf5-bfde-5c6cf88251ad.avro’,
summary=Summary(Operation.APPEND, **{‘manifests-kept’: ‘0’,
‘added-files-size’: ‘112128’, ‘total-records’: ‘708’, ‘manifests-created’: ‘1’,
‘total-data-files’: ‘8’, ‘manifests-replaced’: ‘0’, ‘added-data-files’: ‘8’,
‘added-records’: ‘708’, ‘total-files-size’: ‘112128’}), schema_id=0)],
snapshot_log=[SnapshotLogEntry(snapshot_id=3032990014606473543,
timestamp_ms=1738578919582), SnapshotLogEntry(snapshot_id=78408874928435018,
timestamp_ms=1738578925967)], metadata_log=,
sort_orders=[SortOrder(order_id=0)], default_sort_order_id=0, refs={‘main’:
SnapshotRef(snapshot_id=78408874928435018,
snapshot_ref_type=SnapshotRefType.BRANCH, min_snapshots_to_keep=None,
max_snapshot_age_ms=None, max_ref_age_ms=None)}, format_version=2,
last_sequence_number=2)
Apparently the wasbs scheme was written into the metadata by either Open
Catalog or Snowflake, even though the file is actually located in abfss:
`table.metadata_location`
>
abfss://catalog@[storage_account].blob.core.windows.net/snowflake_managed/jira/roadmap/metadata/[…].metadata.json
There obviously is a discrepancy between `table.metadata` and
`table.metadata_location` - and I can't `table.scan().to_arrow()` the table as
a result
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [x] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]