pankajastro opened a new issue, #37834:
URL: https://github.com/apache/airflow/issues/37834
### Apache Airflow version
main (development)
### If "Other Airflow 2 version" selected, which one?
_No response_
### What happened?
Below DAG is unable to use credentials from Google Cloud connection. I tried
to add the service account key JSON as well as its path. It works when I set
env GOOGLE_APPLICATION_CREDENTIALS too with the value service account key JSON
path.
```python
import pendulum
import requests
from airflow.decorators import dag, task
from airflow.io.path import ObjectStoragePath
base = ObjectStoragePath("gs://airflow-tutorial-data1/",
conn_id="gcp_conn_id")
@dag(
schedule=None,
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
catchup=False,
tags=["example"],
)
def gcs_objectstorage():
@task
def store_data() -> ObjectStoragePath:
import pandas as pd
logical_date = pd.Timestamp.now()
formatted_date = logical_date.strftime("%Y%m%d")
path = base / f"air_quality_{formatted_date}.parquet"
aq_fields = {
"calories": "int32",
"duration": "int32",
}
data = {
"calories": 420,
"duration": 50,
}
df = pd.DataFrame(data, index=[0]).astype(aq_fields)
with path.open("wb") as file:
df.to_parquet(file)
return path
store_data()
gcs_objectstorage()
```
Without `GOOGLE_APPLICATION_CREDENTIALS` I'm getting below error
```
c434c7303a41
*** Found local files:
*** *
/usr/local/airflow/logs/dag_id=gcs_objectstorage/run_id=manual__2024-03-01T17:17:58.472629+00:00/task_id=store_data/attempt=1.log
[2024-03-01, 17:17:59 UTC] {taskinstance.py:1997} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=
[2024-03-01, 17:17:59 UTC] {taskinstance.py:1997} INFO - Dependencies all
met for dep_context=requeueable deps ti=
[2024-03-01, 17:17:59 UTC] {taskinstance.py:2211} INFO - Starting attempt 1
of 1
[2024-03-01, 17:17:59 UTC] {taskinstance.py:2232} INFO - Executing on
2024-03-01 17:17:58.472629+00:00
[2024-03-01, 17:17:59 UTC] {standard_task_runner.py:60} INFO - Started
process 194 to run task
[2024-03-01, 17:17:59 UTC] {standard_task_runner.py:87} INFO - Running:
['airflow', 'tasks', 'run', 'gcs_objectstorage', 'store_data',
'manual__2024-03-01T17:17:58.472629+00:00', '--job-id', '150', '--raw',
'--subdir', 'DAGS_FOLDER/obj_storage1.py', '--cfg-path', '/tmp/tmpfmrrthnh']
[2024-03-01, 17:17:59 UTC] {standard_task_runner.py:88} INFO - Job 150:
Subtask store_data
[2024-03-01, 17:17:59 UTC] {task_command.py:424} INFO - Running on host
c434c7303a41
[2024-03-01, 17:17:59 UTC] {taskinstance.py:2537} INFO - Exporting env vars:
AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='gcs_objectstorage'
AIRFLOW_CTX_TASK_ID='store_data'
AIRFLOW_CTX_EXECUTION_DATE='2024-03-01T17:17:58.472629+00:00'
AIRFLOW_CTX_TRY_NUMBER='1'
AIRFLOW_CTX_DAG_RUN_ID='manual__2024-03-01T17:17:58.472629+00:00'
[2024-03-01, 17:18:00 UTC] {connection.py:269} WARNING - Connection schemes
(type: google_cloud_platform) shall not contain '_' according to RFC3986.
[2024-03-01, 17:18:00 UTC] {base.py:83} INFO - Using connection ID
'gcp_conn_id' for task execution.
[2024-03-01, 17:18:03 UTC] {_metadata.py:139} WARNING - Compute Engine
Metadata server unavailable on attempt 1 of 3. Reason: timed out
[2024-03-01, 17:18:06 UTC] {_metadata.py:139} WARNING - Compute Engine
Metadata server unavailable on attempt 2 of 3. Reason: timed out
[2024-03-01, 17:18:06 UTC] {_metadata.py:139} WARNING - Compute Engine
Metadata server unavailable on attempt 3 of 3. Reason: [Errno 111] Connection
refused
[2024-03-01, 17:18:06 UTC] {_default.py:338} WARNING - Authentication failed
using Compute Engine authentication due to unavailable metadata server.
[2024-03-01, 17:18:06 UTC] {_metadata.py:208} WARNING - Compute Engine
Metadata server unavailable on attempt 1 of 5. Reason:
HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries
exceeded with url:
/computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused
by NameResolutionError(": Failed to resolve 'metadata.google.internal' ([Errno
-2] Name or service not known)"))
[2024-03-01, 17:18:06 UTC] {_metadata.py:208} WARNING - Compute Engine
Metadata server unavailable on attempt 2 of 5. Reason:
HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries
exceeded with url:
/computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused
by NameResolutionError(": Failed to resolve 'metadata.google.internal' ([Errno
-2] Name or service not known)"))
[2024-03-01, 17:18:06 UTC] {_metadata.py:208} WARNING - Compute Engine
Metadata server unavailable on attempt 3 of 5. Reason:
HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries
exceeded with url:
/computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused
by NameResolutionError(": Failed to resolve 'metadata.google.internal' ([Errno
-2] Name or service not known)"))
[2024-03-01, 17:18:06 UTC] {_metadata.py:208} WARNING - Compute Engine
Metadata server unavailable on attempt 4 of 5. Reason:
HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries
exceeded with url:
/computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused
by NameResolutionError(": Failed to resolve 'metadata.google.internal' ([Errno
-2] Name or service not known)"))
[2024-03-01, 17:18:06 UTC] {_metadata.py:208} WARNING - Compute Engine
Metadata server unavailable on attempt 5 of 5. Reason:
HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries
exceeded with url:
/computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused
by NameResolutionError(": Failed to resolve 'metadata.google.internal' ([Errno
-2] Name or service not known)"))
[2024-03-01, 17:18:06 UTC] {retry.py:157} ERROR - _request non-retriable
exception: Anonymous caller does not have storage.objects.create access to the
Google Cloud Storage object. Permission 'storage.objects.create' denied on
resource (or it may not exist)., 401
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/gcsfs/retry.py", line 123,
in retry_request
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/gcsfs/core.py", line 430, in
_request
validate_response(status, contents, path, args)
File "/usr/local/lib/python3.11/site-packages/gcsfs/retry.py", line 110,
in validate_response
raise HttpError(error)
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.create
access to the Google Cloud Storage object. Permission 'storage.objects.create'
denied on resource (or it may not exist)., 401
[2024-03-01, 17:18:06 UTC] {taskinstance.py:2774} ERROR - Task failed with
exception
Traceback (most recent call last):
File
"/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line
447, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line
417, in _execute_callable
return execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/airflow/decorators/base.py",
line 238, in execute
return_value = super().execute(context)
^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/airflow/operators/python.py", line
200, in execute
return_value = self.execute_callable()
^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/airflow/operators/python.py", line
217, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/dags/obj_storage1.py", line 36, in store_data
with path.open("wb") as file:
File "/usr/local/lib/python3.11/site-packages/fsspec/spec.py", line 1965,
in __exit__
self.close()
File "/usr/local/lib/python3.11/site-packages/fsspec/spec.py", line 1932,
in close
self.flush(force=True)
File "/usr/local/lib/python3.11/site-packages/fsspec/spec.py", line 1798,
in flush
self._initiate_upload()
File "/usr/local/lib/python3.11/site-packages/gcsfs/core.py", line 1799,
in _initiate_upload
self.location = sync(
^^^^^
File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 103,
in sync
raise return_result
File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in
_runner
result[0] = await coro
^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/gcsfs/core.py", line 1916,
in initiate_upload
headers, _ = await fs._call(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/gcsfs/core.py", line 437, in
_call
status, headers, info, contents = await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/decorator.py", line 221, in
fun
return await caller(func, *(extras + args), **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/gcsfs/retry.py", line 158,
in retry_request
raise e
File "/usr/local/lib/python3.11/site-packages/gcsfs/retry.py", line 123,
in retry_request
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/gcsfs/core.py", line 430, in
_request
validate_response(status, contents, path, args)
File "/usr/local/lib/python3.11/site-packages/gcsfs/retry.py", line 110,
in validate_response
raise HttpError(error)
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.create
access to the Google Cloud Storage object. Permission 'storage.objects.create'
denied on resource (or it may not exist)., 401
[2024-03-01, 17:18:06 UTC] {taskinstance.py:1168} INFO - Marking task as
FAILED. dag_id=gcs_objectstorage, task_id=store_data,
execution_date=20240301T171758, start_date=20240301T171759,
end_date=20240301T171806
[2024-03-01, 17:18:06 UTC] {standard_task_runner.py:107} ERROR - Failed to
execute job 150 for task store_data (Anonymous caller does not have
storage.objects.create access to the Google Cloud Storage object. Permission
'storage.objects.create' denied on resource (or it may not exist)., 401; 194)
[2024-03-01, 17:18:06 UTC] {local_task_job_runner.py:234} INFO - Task exited
with return code 1
[2024-03-01, 17:18:06 UTC] {taskinstance.py:3357} INFO - 0 downstream tasks
scheduled from follow-on schedule check
```
### What you think should happen instead?
The above DAG should be able to use credentials from airflow connection and
it should not require to export env GOOGLE_APPLICATION_CREDENTIALS
### How to reproduce
Run above DAG
### Operating System
Linux
### Versions of Apache Airflow Providers
_No response_
### Deployment
Docker-Compose
### Deployment details
_No response_
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]