gabor-one opened a new issue, #39028:
URL: https://github.com/apache/airflow/issues/39028
### Apache Airflow Provider(s)
microsoft-azure
### Versions of Apache Airflow Providers
apache-airflow-providers-microsoft-azure==9.0.1
### Apache Airflow version
2.9.0
### Operating System
Debian GNU/Linux 12 (bookworm)
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
- Platform: Kubernetes (AKS)
- Executor: KubernetesExecutor
- Using Azure Key-Vault as the secret provider via Workload Identity.
- Using azure_remote_logging (Azure Blob Storage)
### What happened
If the connection is defined in Azure Key Vault then the task pods cannot
write logs to Azure Blob Storage at the end of the execution. There is a random
' ' (Space character) in the storage account URL (see the last line in log).
WASB Airflow connection is defined as this in Key Vault:
`wasb://https://<STORAGE_ACCOUNT_NAME>.blob.core.windows.net`
If the connection is created via UI and 'remote_log_conn_id' is changed to
use that connection for logging everything works fine.
Logs:
```
[2024-04-15, 12:03:07 UTC] {retries.py:91} DEBUG - Running
Job._fetch_from_db with retries. Try 1 of 3
[2024-04-15, 12:03:07 UTC] {retries.py:91} DEBUG - Running
Job._update_heartbeat with retries. Try 1 of 3
[2024-04-15, 12:03:07 UTC] {job.py:214} DEBUG - [heartbeat]
[2024-04-15, 12:03:12 UTC] {retries.py:91} DEBUG - Running
Job._fetch_from_db with retries. Try 1 of 3
[2024-04-15, 12:03:12 UTC] {retries.py:91} DEBUG - Running
Job._update_heartbeat with retries. Try 1 of 3
[2024-04-15, 12:03:12 UTC] {job.py:214} DEBUG - [heartbeat]
[2024-04-15, 12:03:13 UTC] {taskinstance.py:441} ▼ Post task execution logs
[2024-04-15, 12:03:13 UTC] {taskinstance.py:2890} ERROR - Task failed with
exception
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.10/site-packages/aiohttp/connector.py", line
1173, in _create_direct_connection
hosts = await asyncio.shield(host_resolved)
File
"/home/airflow/.local/lib/python3.10/site-packages/aiohttp/connector.py", line
884, in _resolve_host
addrs = await self._resolver.resolve(host, port, family=self._family)
File
"/home/airflow/.local/lib/python3.10/site-packages/aiohttp/resolver.py", line
33, in resolve
infos = await self._loop.getaddrinfo(
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 863, in
getaddrinfo
return await self.run_in_executor(
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in
run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/transport/_aiohttp.py",
line 294, in send
result = await self.session.request( # type: ignore
File
"/home/airflow/.local/lib/python3.10/site-packages/aiohttp/client.py", line
578, in _request
conn = await self._connector.connect(
File
"/home/airflow/.local/lib/python3.10/site-packages/aiohttp/connector.py", line
544, in connect
proto = await self._create_connection(req, traces, timeout)
File
"/home/airflow/.local/lib/python3.10/site-packages/aiohttp/connector.py", line
911, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File
"/home/airflow/.local/lib/python3.10/site-packages/aiohttp/connector.py", line
1187, in _create_direct_connection
raise ClientConnectorError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host
<STORAGE_ACCOUNT_NAME> .blob.core.windows.net:443 ssl:default [Name or service
not known]
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
line 465, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
line 432, in _execute_callable
return execute_callable(context=context, **execute_callable_kwargs)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py",
line 400, in wrapper
return func(self, *args, **kwargs)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/decorators/base.py",
line 265, in execute
return_value = super().execute(context)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py",
line 400, in wrapper
return func(self, *args, **kwargs)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py",
line 235, in execute
return_value = self.execute_callable()
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py",
line 252, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/opt/airflow/dags/repo/src/workflows/test.py", line 24, in
test_features
print(f"Got access to datalake. ls: {fs.ls(datalake_folder)}")
File "/home/airflow/.local/lib/python3.10/site-packages/fsspec/asyn.py",
line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/fsspec/asyn.py",
line 103, in sync
raise return_result
File "/home/airflow/.local/lib/python3.10/site-packages/fsspec/asyn.py",
line 56, in _runner
result[0] = await coro
File "/home/airflow/.local/lib/python3.10/site-packages/adlfs/spec.py",
line 823, in _ls
output = await self._ls_blobs(
File "/home/airflow/.local/lib/python3.10/site-packages/adlfs/spec.py",
line 724, in _ls_blobs
async for next_blob in blobs:
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/async_paging.py",
line 142, in __anext__
return await self.__anext__()
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/async_paging.py",
line 145, in __anext__
self._page = await self._page_iterator.__anext__()
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/async_paging.py",
line 94, in __anext__
self._response = await self._get_next(self.continuation_token)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/aio/_list_blobs_helper.py",
line 83, in _get_next_cb
return await self._command(
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/tracing/decorator_async.py",
line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_generated/aio/operations/_container_operations.py",
line 1886, in list_blob_hierarchy_segment
pipeline_response: PipelineResponse = await self._client._pipeline.run(
# pylint: disable=protected-access
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py",
line 221, in run
return await first_node.send(pipeline_request)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py",
line 69, in send
response = await self.next.send(request)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py",
line 69, in send
response = await self.next.send(request)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py",
line 69, in send
response = await self.next.send(request)
[Previous line repeated 3 more times]
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_authentication_async.py",
line 100, in send
response = await self.next.send(request)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py",
line 69, in send
response = await self.next.send(request)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_redirect_async.py",
line 73, in send
response = await self.next.send(request)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py",
line 69, in send
response = await self.next.send(request)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/policies_async.py",
line 137, in send
raise err
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/policies_async.py",
line 111, in send
response = await self.next.send(request)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py",
line 69, in send
response = await self.next.send(request)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/policies_async.py",
line 64, in send
response = await self.next.send(request)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py",
line 69, in send
response = await self.next.send(request)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py",
line 69, in send
response = await self.next.send(request)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py",
line 106, in send
await self._sender.send(request.http_request, **request.context.options),
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/base_client_async.py",
line 175, in send
return await self._transport.send(request, **kwargs)
File
"/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/transport/_aiohttp.py",
line 332, in send
raise ServiceRequestError(err, error=err) from err
azure.core.exceptions.ServiceRequestError: Cannot connect to host
<STORAGE_ACCOUNT_NAME> .blob.core.windows.net:443 ssl:default [Name or service
not known]
```
WASB-DEFAULT connection defined in the Key-Vault that produces a random
space in the URL:
```
>airflow connections get wasb-default -o yaml
- conn_id: wasb-default
conn_type: wasb
description: null
extra_dejson: {}
get_uri: wasb://https://<STORAGE_ACCOUNT_NAME>.blob.core.windows.net
host: https://<STORAGE_ACCOUNT_NAME>.blob.core.windows.net
id: null
is_encrypted: null
is_extra_encrypted: null
login: null
password: null
port: null
schema: ''
```
WASB connection defined via UI that works:
```
>airflow connections get abc -o yaml
- conn_id: abc
conn_type: wasb
description: ''
extra_dejson: {}
get_uri: wasb://https://<STORAGE_ACCOUNT_NAME>.blob.core.windows.net
host: https://<STORAGE_ACCOUNT_NAME>.blob.core.windows.net
id: '1'
is_encrypted: 'False'
is_extra_encrypted: 'False'
login: ''
password: null
port: null
schema: ''
```
### What you think should happen instead
WASB connections defined via Key-Vault should not produce an extra ' '
(space) character in the URL for no reason just as connections create via UI
don't.
### How to reproduce
1. Setup Azure Kubernetes to use Workload Identity. Attach service account
to pods. Federate identity to service account. Give that federated identity
access to Azure Storage Account.
2. Configure Airflow to use Azure Key-Vault as secret backend.
3. Configure Airflow to use azure_remote_logging.
4. Create an Airflow WASB connection secret in Key-Vault. Use example from
above.
5. Run a DAG.
6. Task will fail due to task will not be able to write logs to Storage
Container.
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]