samuelkhtu opened a new issue, #53333:
URL: https://github.com/apache/airflow/issues/53333
### Apache Airflow version
Other Airflow 2 version (please specify below)
### If "Other Airflow 2 version" selected, which one?
2.10.5
### What happened?
In `airflow/providers/microsoft/azure/fs/adls.py`, the `get_fs()` function
constructs a dictionary of `options` by pulling connection information from
Airflow's connection system and then passes these options to
`AzureBlobFileSystem(**options)`.
By default, the function constructs `account_url` using
`parse_blob_account_url(conn.host, conn.login)`, which assumes the Azure Blob
endpoint will use the standard `core.windows.net` domain. While this works for
default endpoints, it does not support scenarios where users want to override
the domain — for example, when using a private endpoint like
`.core.mydomain.io`.
The root issue is:
* `account_host` (the correct field expected by `adlfs.AzureBlobFileSystem`)
is not included in the list of parsed fields.
* Even if the user provides `account_host` in the connection extras,
`get_fs()` ignores it and always constructs `account_url` using the hardcoded
domain logic.
* `AzureBlobFileSystem` does not support `account_url` as a constructor
parameter, so the custom domain is never applied — silently falling back to the
default.
As a result, there is **no way for users to override the account URL** via
Airflow connection configuration, even though `adlfs.AzureBlobFileSystem`
supports this through its `account_host` parameter.
This blocks use cases such as:
* Custom domains
* Private endpoints
* Sovereign or air-gapped cloud regions
This limitation exists even though the underlying library (`adlfs`) already
supports the necessary parameter (`account_host`).
### What you think should happen instead?
### What you think should happen instead?
The `get_fs()` function should support passing a user-defined `account_host`
value from the Airflow connection extras directly to the `AzureBlobFileSystem`
constructor.
Specifically:
* Add `"account_host"` to the list of fields extracted from `extras`.
* If `account_host` is provided, it should be passed directly to
`AzureBlobFileSystem` as a supported parameter.
* Currently, `get_fs()` sets `account_url` using
`parse_blob_account_url(...)`, but `account_url` is **not** a valid parameter
for `AzureBlobFileSystem`. It can be removed or renamed to `account_host`.
However, to maintain backward compatibility:
* We could retain the existing `account_url` logic as a fallback.
* But prefer `account_host` when it is explicitly defined in the extras.
This would allow users to configure non-standard Azure Blob endpoints — such
as custom domains or private links — via the standard Airflow connection
mechanism, while maintaining compatibility with existing deployments.
### How to reproduce
1. Create an Airflow `Connection` object with a custom domain in the `extra`
field:
```python
from airflow.models import Connection
from airflow.providers.microsoft.azure.fs.adls import get_fs
conn = Connection(
conn_id="testconn",
conn_type="wasb",
login="testaccountname",
password="p",
host="testaccountID",
extra={
"account_name": "n",
"tenant_id": "t",
"account_host":
"https://testaccountname.blob.core.customdomain.io",
},
)
# Insert or mock this connection in Airflow metadata
```
2. Call `get_fs()` with this connection ID:
```python
fs = get_fs("testconn")
```
3. Observe that:
* Despite `account_host` being set to a custom domain in extras,
`get_fs()` ignores it.
* The `options` passed to `adlfs.AzureBlobFileSystem` do **not** include
`account_host`.
* Instead, `get_fs()` builds and passes an `account_url` derived from the
default domain based on `host` and `login`.
* As a result, the custom domain override does not take effect.
This confirms the current limitation that `account_host` cannot be used to
override the default Azure Blob endpoint via Airflow’s connection system.
### Operating System
MacOS 15.5
### Versions of Apache Airflow Providers
apache-airflow-providers-microsoft-azure==12.5.0
### Deployment
Astronomer
### Deployment details
k8s
### Anything else?
* This is a backward-compatible enhancement since adding support for
`account_host` does not remove or change existing parameters.
* Supporting `account_host` enables Airflow to better integrate with Azure
environments using private endpoints, custom domains, or sovereign clouds.
* The underlying `adlfs.AzureBlobFileSystem` already supports
`account_host`, so this change leverages existing functionality.
* Implementing this will improve user experience and reduce the need for
workarounds or custom patches.
* I want to submit a PR but would appreciate suggestions on the best
approach.
* My current thinking is to simply add `"account_host"` to the existing
`fields` list in `get_fs()` so that this block picks it up automatically:
```python
fields = [
"account_name",
"account_key",
"sas_token",
"tenant_id",
"managed_identity_client_id",
"workload_identity_client_id",
"workload_identity_tenant_id",
"anon",
"account_host", # <- add here
]
for field in fields:
value = get_field(conn_id=conn_id, conn_type=conn_type, extras=extras,
field_name=field)
if value is not None:
if value == "":
options.pop(field, "")
else:
options[field] = value
```
* Would this be the preferred way, or are there alternative approaches to
consider?
### Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]