antonlin1 opened a new pull request, #3005: URL: https://github.com/apache/iceberg-python/pull/3005
## Summary - When `adls.account-name` is not in catalog/table properties (common for tables created by Spark/Hadoop), `FsspecFileIO` created `AzureBlobFileSystem` with `account_name=None` - adlfs `_strip_protocol()` strips `abfss://[email protected]/path` to `container/path`, losing the storage account info, causing `FileNotFoundError` - The fix extracts `account_name` from the URI hostname as a last-resort fallback in `_adls()`, after SAS token extraction and explicit property checks ### Priority order for account_name resolution: 1. Explicit `adls.account-name` property 2. SAS token key extraction (existing behavior) 3. **NEW**: URI hostname extraction (e.g. `usagestorageprod.dfs.core.windows.net` → `usagestorageprod`) ### Root cause Spark/Java Iceberg uses Hadoop's `FileSystem` API which resolves schemeless paths against `fs.defaultFS`. PyIceberg has no equivalent — `FsspecFileIO` relies entirely on `adls.account-name` in properties, which Spark-created tables typically don't set. ## Test plan - [x] `test_adls_account_name_extracted_from_uri_hostname` — verifies account extraction from full ABFSS URI - [x] `test_adls_account_name_not_overridden_when_in_properties` — verifies explicit property takes priority - [x] Existing `test_adls_account_name_sas_token_extraction` still passes (SAS token takes priority over hostname) 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
