pankajastro opened a new issue, #37834:
URL: https://github.com/apache/airflow/issues/37834

   ### Apache Airflow version
   
   main (development)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   Below DAG is unable to use credentials from Google Cloud connection. I tried 
to add the service account key JSON as well as its path. It works when I set 
env GOOGLE_APPLICATION_CREDENTIALS too with the value service account key JSON 
path.
   
   ```python
   import pendulum
   import requests
   
   from airflow.decorators import dag, task
   from airflow.io.path import ObjectStoragePath
   
   base = ObjectStoragePath("gs://airflow-tutorial-data1/", 
conn_id="gcp_conn_id")
   
   
   @dag(
       schedule=None,
       start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
       catchup=False,
       tags=["example"],
   )
   def gcs_objectstorage():
       @task
       def store_data() -> ObjectStoragePath:
           import pandas as pd
   
           logical_date = pd.Timestamp.now()
           formatted_date = logical_date.strftime("%Y%m%d")
           path = base / f"air_quality_{formatted_date}.parquet"
   
           aq_fields = {
               "calories": "int32",
               "duration": "int32",
           }
   
           data = {
               "calories": 420,
               "duration": 50,
           }
   
           df = pd.DataFrame(data, index=[0]).astype(aq_fields)
           with path.open("wb") as file:
               df.to_parquet(file)
           return path
   
       store_data()
   
   
   gcs_objectstorage()
   
   ``` 
   Without `GOOGLE_APPLICATION_CREDENTIALS` I'm getting below error
   
   ```
   c434c7303a41
   *** Found local files:
   ***   * 
/usr/local/airflow/logs/dag_id=gcs_objectstorage/run_id=manual__2024-03-01T17:17:58.472629+00:00/task_id=store_data/attempt=1.log
   [2024-03-01, 17:17:59 UTC] {taskinstance.py:1997} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=
   [2024-03-01, 17:17:59 UTC] {taskinstance.py:1997} INFO - Dependencies all 
met for dep_context=requeueable deps ti=
   [2024-03-01, 17:17:59 UTC] {taskinstance.py:2211} INFO - Starting attempt 1 
of 1
   [2024-03-01, 17:17:59 UTC] {taskinstance.py:2232} INFO - Executing  on 
2024-03-01 17:17:58.472629+00:00
   [2024-03-01, 17:17:59 UTC] {standard_task_runner.py:60} INFO - Started 
process 194 to run task
   [2024-03-01, 17:17:59 UTC] {standard_task_runner.py:87} INFO - Running: 
['airflow', 'tasks', 'run', 'gcs_objectstorage', 'store_data', 
'manual__2024-03-01T17:17:58.472629+00:00', '--job-id', '150', '--raw', 
'--subdir', 'DAGS_FOLDER/obj_storage1.py', '--cfg-path', '/tmp/tmpfmrrthnh']
   [2024-03-01, 17:17:59 UTC] {standard_task_runner.py:88} INFO - Job 150: 
Subtask store_data
   [2024-03-01, 17:17:59 UTC] {task_command.py:424} INFO - Running  on host 
c434c7303a41
   [2024-03-01, 17:17:59 UTC] {taskinstance.py:2537} INFO - Exporting env vars: 
AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='gcs_objectstorage' 
AIRFLOW_CTX_TASK_ID='store_data' 
AIRFLOW_CTX_EXECUTION_DATE='2024-03-01T17:17:58.472629+00:00' 
AIRFLOW_CTX_TRY_NUMBER='1' 
AIRFLOW_CTX_DAG_RUN_ID='manual__2024-03-01T17:17:58.472629+00:00'
   [2024-03-01, 17:18:00 UTC] {connection.py:269} WARNING - Connection schemes 
(type: google_cloud_platform) shall not contain '_' according to RFC3986.
   [2024-03-01, 17:18:00 UTC] {base.py:83} INFO - Using connection ID 
'gcp_conn_id' for task execution.
   [2024-03-01, 17:18:03 UTC] {_metadata.py:139} WARNING - Compute Engine 
Metadata server unavailable on attempt 1 of 3. Reason: timed out
   [2024-03-01, 17:18:06 UTC] {_metadata.py:139} WARNING - Compute Engine 
Metadata server unavailable on attempt 2 of 3. Reason: timed out
   [2024-03-01, 17:18:06 UTC] {_metadata.py:139} WARNING - Compute Engine 
Metadata server unavailable on attempt 3 of 3. Reason: [Errno 111] Connection 
refused
   [2024-03-01, 17:18:06 UTC] {_default.py:338} WARNING - Authentication failed 
using Compute Engine authentication due to unavailable metadata server.
   [2024-03-01, 17:18:06 UTC] {_metadata.py:208} WARNING - Compute Engine 
Metadata server unavailable on attempt 1 of 5. Reason: 
HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries 
exceeded with url: 
/computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused 
by NameResolutionError(": Failed to resolve 'metadata.google.internal' ([Errno 
-2] Name or service not known)"))
   [2024-03-01, 17:18:06 UTC] {_metadata.py:208} WARNING - Compute Engine 
Metadata server unavailable on attempt 2 of 5. Reason: 
HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries 
exceeded with url: 
/computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused 
by NameResolutionError(": Failed to resolve 'metadata.google.internal' ([Errno 
-2] Name or service not known)"))
   [2024-03-01, 17:18:06 UTC] {_metadata.py:208} WARNING - Compute Engine 
Metadata server unavailable on attempt 3 of 5. Reason: 
HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries 
exceeded with url: 
/computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused 
by NameResolutionError(": Failed to resolve 'metadata.google.internal' ([Errno 
-2] Name or service not known)"))
   [2024-03-01, 17:18:06 UTC] {_metadata.py:208} WARNING - Compute Engine 
Metadata server unavailable on attempt 4 of 5. Reason: 
HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries 
exceeded with url: 
/computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused 
by NameResolutionError(": Failed to resolve 'metadata.google.internal' ([Errno 
-2] Name or service not known)"))
   [2024-03-01, 17:18:06 UTC] {_metadata.py:208} WARNING - Compute Engine 
Metadata server unavailable on attempt 5 of 5. Reason: 
HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries 
exceeded with url: 
/computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused 
by NameResolutionError(": Failed to resolve 'metadata.google.internal' ([Errno 
-2] Name or service not known)"))
   [2024-03-01, 17:18:06 UTC] {retry.py:157} ERROR - _request non-retriable 
exception: Anonymous caller does not have storage.objects.create access to the 
Google Cloud Storage object. Permission 'storage.objects.create' denied on 
resource (or it may not exist)., 401
   Traceback (most recent call last):
     File "/usr/local/lib/python3.11/site-packages/gcsfs/retry.py", line 123, 
in retry_request
       return await func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/gcsfs/core.py", line 430, in 
_request
       validate_response(status, contents, path, args)
     File "/usr/local/lib/python3.11/site-packages/gcsfs/retry.py", line 110, 
in validate_response
       raise HttpError(error)
   gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.create 
access to the Google Cloud Storage object. Permission 'storage.objects.create' 
denied on resource (or it may not exist)., 401
   [2024-03-01, 17:18:06 UTC] {taskinstance.py:2774} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 
447, in _execute_task
       result = _execute_callable(context=context, **execute_callable_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 
417, in _execute_callable
       return execute_callable(context=context, **execute_callable_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/airflow/decorators/base.py", 
line 238, in execute
       return_value = super().execute(context)
                      ^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/airflow/operators/python.py", line 
200, in execute
       return_value = self.execute_callable()
                      ^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/airflow/operators/python.py", line 
217, in execute_callable
       return self.python_callable(*self.op_args, **self.op_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/airflow/dags/obj_storage1.py", line 36, in store_data
       with path.open("wb") as file:
     File "/usr/local/lib/python3.11/site-packages/fsspec/spec.py", line 1965, 
in __exit__
       self.close()
     File "/usr/local/lib/python3.11/site-packages/fsspec/spec.py", line 1932, 
in close
       self.flush(force=True)
     File "/usr/local/lib/python3.11/site-packages/fsspec/spec.py", line 1798, 
in flush
       self._initiate_upload()
     File "/usr/local/lib/python3.11/site-packages/gcsfs/core.py", line 1799, 
in _initiate_upload
       self.location = sync(
                       ^^^^^
     File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 103, 
in sync
       raise return_result
     File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in 
_runner
       result[0] = await coro
                   ^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/gcsfs/core.py", line 1916, 
in initiate_upload
       headers, _ = await fs._call(
                    ^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/gcsfs/core.py", line 437, in 
_call
       status, headers, info, contents = await self._request(
                                         ^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/decorator.py", line 221, in 
fun
       return await caller(func, *(extras + args), **kw)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/gcsfs/retry.py", line 158, 
in retry_request
       raise e
     File "/usr/local/lib/python3.11/site-packages/gcsfs/retry.py", line 123, 
in retry_request
       return await func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/site-packages/gcsfs/core.py", line 430, in 
_request
       validate_response(status, contents, path, args)
     File "/usr/local/lib/python3.11/site-packages/gcsfs/retry.py", line 110, 
in validate_response
       raise HttpError(error)
   gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.create 
access to the Google Cloud Storage object. Permission 'storage.objects.create' 
denied on resource (or it may not exist)., 401
   [2024-03-01, 17:18:06 UTC] {taskinstance.py:1168} INFO - Marking task as 
FAILED. dag_id=gcs_objectstorage, task_id=store_data, 
execution_date=20240301T171758, start_date=20240301T171759, 
end_date=20240301T171806
   [2024-03-01, 17:18:06 UTC] {standard_task_runner.py:107} ERROR - Failed to 
execute job 150 for task store_data (Anonymous caller does not have 
storage.objects.create access to the Google Cloud Storage object. Permission 
'storage.objects.create' denied on resource (or it may not exist)., 401; 194)
   [2024-03-01, 17:18:06 UTC] {local_task_job_runner.py:234} INFO - Task exited 
with return code 1
   [2024-03-01, 17:18:06 UTC] {taskinstance.py:3357} INFO - 0 downstream tasks 
scheduled from follow-on schedule check
   ```
   
   
   ### What you think should happen instead?
   
   The above DAG should be able to use credentials from airflow connection and 
it should not require to export env GOOGLE_APPLICATION_CREDENTIALS
   
   ### How to reproduce
   
   Run above DAG
   
   ### Operating System
   
   Linux
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to