foldedverse opened a new issue, #31986:
URL: https://github.com/apache/airflow/issues/31986

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   I encountered an issue while running an AutoML task. The task failed with an 
authentication error due to the inability to find the project ID. Here are the 
details of the error:
   
   ```
   [2023-06-17T18:42:48.916+0530] 
{[taskinstance.py:1308](http://taskinstance.py:1308/)} INFO - Starting attempt 
1 of 1
   [2023-06-17T18:42:48.931+0530] 
{[taskinstance.py:1327](http://taskinstance.py:1327/)} INFO - Executing 
<Task(CreateAutoMLTabularTrainingJobOperator): auto_ml_tabular_task> on 
2023-06-17 13:12:33+00:00
   [2023-06-17T18:42:48.964+0530] 
{[standard_task_runner.py:57](http://standard_task_runner.py:57/)} INFO - 
Started process 12974 to run task
   [2023-06-17T18:42:48.971+0530] 
{[standard_task_runner.py:84](http://standard_task_runner.py:84/)} INFO - 
Running: ['airflow', 'tasks', 'run', 
'vi_create_auto_ml_tabular_training_job_dag', 'auto_ml_tabular_task', 
'manual__2023-06-17T13:12:33+00:00', '--job-id', '175', '--raw', '--subdir', 
'DAGS_FOLDER/vi_create_model_train.py', '--cfg-path', '/tmp/tmprijpfzql']
   [2023-06-17T18:42:48.974+0530] 
{[standard_task_runner.py:85](http://standard_task_runner.py:85/)} INFO - Job 
175: Subtask auto_ml_tabular_task
   [2023-06-17T18:42:49.043+0530] 
{[logging_mixin.py:149](http://logging_mixin.py:149/)} INFO - Changing 
/mnt/d/projects/airflow/logs/dag_id=vi_create_auto_ml_tabular_training_job_dag/run_id=manual__2023-06-17T13:12:33+00:00/task_id=auto_ml_tabular_task
 permission to 509
   [2023-06-17T18:42:49.044+0530] 
{[task_command.py:410](http://task_command.py:410/)} INFO - Running 
<TaskInstance: vi_create_auto_ml_tabular_training_job_dag.auto_ml_tabular_task 
manual__2023-06-17T13:12:33+00:00 [running]> on host DESKTOP-EIFUHU2.localdomain
   [2023-06-17T18:42:49.115+0530] 
{[taskinstance.py:1545](http://taskinstance.py:1545/)} INFO - Exporting env 
vars: AIRFLOW_CTX_DAG_OWNER='airflow' 
AIRFLOW_CTX_DAG_ID='vi_create_auto_ml_tabular_training_job_dag' 
AIRFLOW_CTX_TASK_ID='auto_ml_tabular_task' 
AIRFLOW_CTX_EXECUTION_DATE='2023-06-17T13:12:33+00:00' 
AIRFLOW_CTX_TRY_NUMBER='1' 
AIRFLOW_CTX_DAG_RUN_ID='manual__2023-06-17T13:12:33+00:00'
   [2023-06-17T18:42:49.120+0530] {[base.py:73](http://base.py:73/)} INFO - 
Using connection ID 'gcp_conn' for task execution.
   [2023-06-17T18:42:52.123+0530] 
{[_metadata.py:141](http://_metadata.py:141/)} WARNING - Compute Engine 
Metadata server unavailable on attempt 1 of 3. Reason: timed out
   [2023-06-17T18:42:55.125+0530] 
{[_metadata.py:141](http://_metadata.py:141/)} WARNING - Compute Engine 
Metadata server unavailable on attempt 2 of 3. Reason: timed out
   [2023-06-17T18:42:58.128+0530] 
{[_metadata.py:141](http://_metadata.py:141/)} WARNING - Compute Engine 
Metadata server unavailable on attempt 3 of 3. Reason: timed out
   [2023-06-17T18:42:58.129+0530] {[_default.py:340](http://_default.py:340/)} 
WARNING - Authentication failed using Compute Engine authentication due to 
unavailable metadata server.
   [2023-06-17T18:42:58.131+0530] 
{[taskinstance.py:1824](http://taskinstance.py:1824/)} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/mnt/d/projects/tvenv/lib/python3.8/site-packages/google/cloud/aiplatform/initializer.py",
 line 244, in project
       self._set_project_as_env_var_or_google_auth_default()
     File 
"/mnt/d/projects/tvenv/lib/python3.8/site-packages/google/cloud/aiplatform/initializer.py",
 line 81, in _set_project_as_env_var_or_google_auth_default
       credentials, project = google.auth.default()
     File 
"/mnt/d/projects/tvenv/lib/python3.8/site-packages/google/auth/_default.py", 
line 692, in default
       raise exceptions.DefaultCredentialsError(_CLOUD_SDK_MISSING_CREDENTIALS)
   google.auth.exceptions.DefaultCredentialsError: Your default credentials 
were not found. To set up Application Default Credentials, see 
[https://cloud.google.com/docs/authentication/external/set-up-adc](https://cloud.google.com/docs/authentication/external/set-up-adc?authuser=0)
 for more information.
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File 
"/mnt/d/projects/tvenv/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/vertex_ai/auto_ml.py",
 line 359, in execute
       dataset=datasets.TabularDataset(dataset_name=self.dataset_id),
     File 
"/mnt/d/projects/tvenv/lib/python3.8/site-packages/google/cloud/aiplatform/datasets/dataset.py",
 line 77, in __init__
       super().__init__(
     File 
"/mnt/d/projects/tvenv/lib/python3.8/site-packages/google/cloud/aiplatform/base.py",
 line 925, in __init__
       VertexAiResourceNoun.__init__(
     File 
"/mnt/d/projects/tvenv/lib/python3.8/site-packages/google/cloud/aiplatform/base.py",
 line 507, in __init__
       self.project = project or initializer.global_config.project
     File 
"/mnt/d/projects/tvenv/lib/python3.8/site-packages/google/cloud/aiplatform/initializer.py",
 line 247, in project
       raise GoogleAuthError(project_not_found_exception_str) from exc
   google.auth.exceptions.GoogleAuthError: Unable to find your project. Please 
provide a project ID by:
   - Passing a constructor argument
   - Using aiplatform.init()
   - Setting project using 'gcloud config set project my-project'
   - Setting a GCP environment variable
   [2023-06-17T18:42:58.139+0530] 
{[taskinstance.py:1345](http://taskinstance.py:1345/)} INFO - Marking task as 
FAILED. dag_id=vi_create_auto_ml_tabular_training_job_dag, 
task_id=auto_ml_tabular_task, execution_date=20230617T131233, 
start_date=20230617T131248, end_date=20230617T131258
   [2023-06-17T18:42:58.152+0530] 
{[standard_task_runner.py:104](http://standard_task_runner.py:104/)} ERROR - 
Failed to execute job 175 for task auto_ml_tabular_task (Unable to find your 
project. Please provide a project ID by:
   - Passing a constructor argument
   - Using aiplatform.init()
   - Setting project using 'gcloud config set project my-project'
   - Setting a GCP environment variable; 12974)
   [2023-06-17T18:42:58.166+0530] 
{[local_task_job_runner.py:225](http://local_task_job_runner.py:225/)} INFO - 
Task exited with return code 1
   [2023-06-17T18:42:58.183+0530] 
{[taskinstance.py:2651](http://taskinstance.py:2651/)} INFO - 0 downstream 
tasks scheduled from follow-on schedule check
   ```
   
   ### What you think should happen instead
   
   Expected Behavior:
   The AutoML task should execute successfully, using the appropriate project 
ID and credentials for authentication given as per the gcs_con_id provided in 
the dag.
   
   Actual Behavior:
   The task fails with an authentication error due to the inability to find the 
project ID and default credentials.
   
   ### How to reproduce
   
   To reproduce the issue and execute the 
CreateAutoMLTabularTrainingJobOperator task in Apache Airflow, follow these 
steps:
   
   Ensure that Apache Airflow is installed. If not, run the following command 
to install it:
   ```
   pip install apache-airflow
   ```
   Create an instance of the CreateAutoMLTabularTrainingJobOperator within the 
DAG context:
   ```
       create_auto_ml_tabular_training_job = 
CreateAutoMLTabularTrainingJobOperator(
           gcp_conn_id='gcp_conn',
           task_id="auto_ml_tabular_task",
           display_name=TABULAR_DISPLAY_NAME,
           optimization_prediction_type="regression",
           optimization_objective="minimize-rmse",
           #column_transformations=COLUMN_TRANSFORMATIONS,
           dataset_id=tabular_dataset_id, # Get this // 
           target_column="mean_temp",
           training_fraction_split=0.8,
           validation_fraction_split=0.1,
           test_fraction_split=0.1,
           model_display_name='your-model-display-name',
           disable_early_stopping=False,
           region=REGION,
           project_id=PROJECT_ID
       )
   ```
   
   Start the Apache Airflow scheduler and webserver. Open a terminal or command 
prompt and run the following commands:
   
   ```
   # Start the scheduler
   airflow scheduler
   
   # Start the webserver
   airflow webserver
   ```
   
   Access the Apache Airflow web UI by opening a web browser and navigating to 
http://localhost:8080. Ensure that the scheduler and webserver are running 
without any errors.
   
   Navigate to the DAGs page in the Airflow UI and locate the 
vi_create_auto_ml_tabular_training_job_dag DAG. Trigger the DAG manually, 
either by clicking the "Trigger DAG" button or using the Airflow CLI command.
   
   Monitor the DAG execution status and check if the auto_ml_tabular_task 
completes successfully or encounters any errors.
   
   ### Operating System
   
   DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal 
DISTRIB_DESCRIPTION="Ubuntu 20.04 LTS"
   
   ### Versions of Apache Airflow Providers
   
   $ pip freeze
   aiofiles==23.1.0
   aiohttp==3.8.4
   aiosignal==1.3.1
   alembic==1.11.1
   anyio==3.7.0
   apache-airflow==2.6.1
   apache-airflow-providers-common-sql==1.5.1
   apache-airflow-providers-ftp==3.4.1
   apache-airflow-providers-google==10.1.1
   apache-airflow-providers-http==4.4.1
   apache-airflow-providers-imap==3.2.1
   apache-airflow-providers-sqlite==3.4.1
   apispec==5.2.2
   argcomplete==3.1.1
   asgiref==3.7.2
   async-timeout==4.0.2
   attrs==23.1.0
   Babel==2.12.1
   backoff==2.2.1
   blinker==1.6.2
   cachelib==0.9.0
   cachetools==5.3.1
   cattrs==23.1.2
   certifi==2023.5.7
   cffi==1.15.1
   chardet==5.1.0
   charset-normalizer==3.1.0
   click==8.1.3
   clickclick==20.10.2
   colorama==0.4.6
   colorlog==4.8.0
   ConfigUpdater==3.1.1
   connexion==2.14.2
   cron-descriptor==1.4.0
   croniter==1.4.1
   cryptography==41.0.1
   db-dtypes==1.1.1
   Deprecated==1.2.14
   dill==0.3.6
   dnspython==2.3.0
   docutils==0.20.1
   email-validator==1.3.1
   exceptiongroup==1.1.1
   Flask==2.2.5
   Flask-AppBuilder==4.3.0
   Flask-Babel==2.0.0
   Flask-Caching==2.0.2
   Flask-JWT-Extended==4.5.2
   Flask-Limiter==3.3.1
   Flask-Login==0.6.2
   flask-session==0.5.0
   Flask-SQLAlchemy==2.5.1
   Flask-WTF==1.1.1
   frozenlist==1.3.3
   future==0.18.3
   gcloud-aio-auth==4.2.1
   gcloud-aio-bigquery==6.3.0
   gcloud-aio-storage==8.2.0
   google-ads==21.2.0
   google-api-core==2.11.1
   google-api-python-client==2.89.0
   google-auth==2.20.0
   google-auth-httplib2==0.1.0
   google-auth-oauthlib==1.0.0
   google-cloud-aiplatform==1.26.0
   google-cloud-appengine-logging==1.3.0
   google-cloud-audit-log==0.2.5
   google-cloud-automl==2.11.1
   google-cloud-bigquery==3.11.1
   google-cloud-bigquery-datatransfer==3.11.1
   google-cloud-bigquery-storage==2.20.0
   google-cloud-bigtable==2.19.0
   google-cloud-build==3.16.0
   google-cloud-compute==1.11.0
   google-cloud-container==2.24.0
   google-cloud-core==2.3.2
   google-cloud-datacatalog==3.13.0
   google-cloud-dataflow-client==0.8.3
   google-cloud-dataform==0.5.1
   google-cloud-dataplex==1.5.0
   google-cloud-dataproc==5.4.1
   google-cloud-dataproc-metastore==1.11.0
   google-cloud-dlp==3.12.1
   google-cloud-kms==2.17.0
   google-cloud-language==2.10.0
   google-cloud-logging==3.5.0
   google-cloud-memcache==1.7.1
   google-cloud-monitoring==2.15.0
   google-cloud-orchestration-airflow==1.9.0
   google-cloud-os-login==2.9.1
   google-cloud-pubsub==2.17.1
   google-cloud-redis==2.13.0
   google-cloud-resource-manager==1.10.1
   google-cloud-secret-manager==2.16.1
   google-cloud-spanner==3.36.0
   google-cloud-speech==2.20.0
   google-cloud-storage==2.9.0
   google-cloud-tasks==2.13.1
   google-cloud-texttospeech==2.14.1
   google-cloud-translate==3.11.1
   google-cloud-videointelligence==2.11.2
   google-cloud-vision==3.4.2
   google-cloud-workflows==1.10.1
   google-crc32c==1.5.0
   google-resumable-media==2.5.0
   googleapis-common-protos==1.59.1
   graphviz==0.20.1
   greenlet==2.0.2
   grpc-google-iam-v1==0.12.6
   grpcio==1.54.2
   grpcio-gcp==0.2.2
   grpcio-status==1.54.2
   gunicorn==20.1.0
   h11==0.14.0
   httpcore==0.17.2
   httplib2==0.22.0
   httpx==0.24.1
   idna==3.4
   importlib-metadata==4.13.0
   importlib-resources==5.12.0
   inflection==0.5.1
   itsdangerous==2.1.2
   Jinja2==3.1.2
   json-merge-patch==0.2
   jsonschema==4.17.3
   lazy-object-proxy==1.9.0
   limits==3.5.0
   linkify-it-py==2.0.2
   lockfile==0.12.2
   looker-sdk==23.10.0
   Mako==1.2.4
   Markdown==3.4.3
   markdown-it-py==3.0.0
   MarkupSafe==2.1.3
   marshmallow==3.19.0
   marshmallow-enum==1.5.1
   marshmallow-oneofschema==3.0.1
   marshmallow-sqlalchemy==0.26.1
   mdit-py-plugins==0.4.0
   mdurl==0.1.2
   multidict==6.0.4
   numpy==1.24.3
   oauthlib==3.2.2
   ordered-set==4.1.0
   packaging==23.1
   pandas==2.0.2
   pandas-gbq==0.19.2
   pathspec==0.9.0
   pendulum==2.1.2
   pkgutil-resolve-name==1.3.10
   pluggy==1.0.0
   prison==0.2.1
   proto-plus==1.22.2
   protobuf==4.23.3
   psutil==5.9.5
   pyarrow==12.0.1
   pyasn1==0.4.8
   pyasn1-modules==0.2.8
   pycparser==2.21
   pydantic==1.10.9
   pydata-google-auth==1.8.0
   Pygments==2.15.1
   PyJWT==2.7.0
   pyOpenSSL==23.2.0
   pyparsing==3.0.9
   pyrsistent==0.19.3
   python-daemon==3.0.1
   python-dateutil==2.8.2
   python-nvd3==0.15.0
   python-slugify==8.0.1
   pytz==2023.3
   pytzdata==2020.1
   PyYAML==6.0
   requests==2.31.0
   requests-oauthlib==1.3.1
   requests-toolbelt==1.0.0
   rfc3339-validator==0.1.4
   rich==13.4.2
   rich-argparse==1.1.1
   rsa==4.9
   setproctitle==1.3.2
   Shapely==1.8.5.post1
   six==1.16.0
   sniffio==1.3.0
   SQLAlchemy==1.4.48
   sqlalchemy-bigquery==1.6.1
   SQLAlchemy-JSONField==1.0.1.post0
   SQLAlchemy-Utils==0.41.1
   sqlparse==0.4.4
   tabulate==0.9.0
   tenacity==8.2.2
   termcolor==2.3.0
   text-unidecode==1.3
   typing-extensions==4.6.3
   tzdata==2023.3
   uc-micro-py==1.0.2
   unicodecsv==0.14.1
   uritemplate==4.1.1
   urllib3==2.0.3
   Werkzeug==2.3.6
   wrapt==1.15.0
   WTForms==3.0.1
   yarl==1.9.2
   zipp==3.15.0
   
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   $ airflow info
   
   Apache Airflow
   version                | 2.6.1
   executor               | SequentialExecutor
   task_logging_handler   | airflow.utils.log.file_task_handler.FileTaskHandler
   sql_alchemy_conn       | sqlite:////home/test1/airflow/airflow.db
   dags_folder            | /mnt/d/projects/airflow/dags
   plugins_folder         | /home/test1/airflow/plugins                        
   base_log_folder        | /mnt/d/projects/airflow/logs
   remote_base_log_folder |
   
   
   System info
   OS              | Linux
   architecture    | x86_64
   uname           | uname_result(system='Linux', node='DESKTOP-EIFUHU2', 
release='4.4.0-19041-Microsoft', version='#1237-Microsoft Sat Sep 11 14:32:00 
PST      
                   | 2021', machine='x86_64', processor='x86_64')
   locale          | ('en_US', 'UTF-8')
   python_version  | 3.8.10 (default, May 26 2023, 14:05:08)  [GCC 9.4.0]
   python_location | /mnt/d/projects/tvenv/bin/python3
   
   
   Tools info
   git             | git version 2.25.1
   ssh             | OpenSSH_8.2p1 Ubuntu-4, OpenSSL 1.1.1f  31 Mar 2020
   kubectl         | NOT AVAILABLE
   gcloud          | NOT AVAILABLE
   cloud_sql_proxy | NOT AVAILABLE
   mysql           | NOT AVAILABLE                                      
   sqlite3         | NOT AVAILABLE
   psql            | NOT AVAILABLE
   
   
   Paths info
   airflow_home    | /home/test1/airflow
   system_path     | 
/mnt/d/projects/tvenv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/mnt/c/Program
 Files     
                   | (x86)/Microsoft SDKs/Azure/CLI2/wbin:/mnt/c/Program 
Files/Python39/Scripts/:/mnt/c/Program
                   | 
Files/Python39/:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0/:/mnt/c/W
                   | 
indows/System32/OpenSSH/:/mnt/c/Users/ibrez/AppData/Roaming/nvm:/mnt/c/Program 
Files/nodejs:/mnt/c/Program
                   | 
Files/dotnet/:/mnt/c/Windows/system32/config/systemprofile/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/test1/AppData/Local/Microsoft/Wi
                   | ndowsApps:/snap/bin
   python_path     | 
/mnt/d/projects/tvenv/bin:/usr/lib/python38.zip:/usr/lib/python3.8:/usr/lib/python3.8/lib-dynload:/mnt/d/projects/tvenv/lib/python3.8/site-p
                   | 
ackages:/mnt/d/projects/airflow/dags:/home/test1/airflow/config:/home/test1/airflow/plugins
   airflow_on_path | True
   
   
   Providers info
   apache-airflow-providers-common-sql | 1.5.1 
   apache-airflow-providers-ftp        | 3.4.1
   apache-airflow-providers-google     | 10.1.1
   apache-airflow-providers-http       | 4.4.1
   apache-airflow-providers-imap       | 3.2.1
   apache-airflow-providers-sqlite     | 3.4.1
   
   
   
   ### Anything else
   
   To me it seems like the issue is at the line 
   ```
    dataset=datasets.TabularDataset(dataset_name=self.dataset_id),
   ```
   
   
[here](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/operators/vertex_ai/auto_ml.py)
   
   The details like project id and credentials are not being passed to the 
TabularDataset class which causes issues down the like for 
   ```
   GoogleAuthError(project_not_found_exception_str)
   ```
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to