MichailParaskevopoulos opened a new issue #21600:
URL: https://github.com/apache/airflow/issues/21600


   ### Apache Airflow Provider(s)
   
   google
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-google==6.3.0
   
   ### Apache Airflow version
   
   2.1.1
   
   ### Operating System
   
   Debian 10
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   The operator 
`airflow.providers.google.cloud.operators.bigquery.BigQueryCreateEmptyDatasetOperator`
 fails when the GCP project ID can't be determined from the environment:
   
   ```
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 1157, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 1331, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 1361, in _execute_task
       result = task_copy.execute(context=context)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/bigquery.py",
 line 1429, in execute
       bq_hook.create_empty_dataset(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/google/common/hooks/base_google.py",
 line 425, in inner_wrapper
       return func(self, *args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py",
 line 468, in create_empty_dataset
       self.get_client(location=location).create_dataset(dataset=dataset, 
exists_ok=exists_ok)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py",
 line 156, in get_client
       return Client(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/google/cloud/bigquery/client.py",
 line 209, in __init__
       super(Client, self).__init__(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/google/cloud/client.py", line 
318, in __init__
       _ClientProjectMixin.__init__(self, project=project, 
credentials=credentials)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/google/cloud/client.py", line 
269, in __init__
       raise EnvironmentError(
   OSError: Project was not passed and could not be determined from the 
environment.
   ```
   
   I've tried the operator with either providing the `project_id` and 
`dataset_id` or by providing a 'dataset_reference`. I can see the expected 
dataset name and project ID being printed in the logs during the hook's 
execution, right before the `get_client` method is invoked.
   
   When the `get_client` method is called from 
`airflow.providers.google.cloud.hooks.bigquery.BigQueryHook.create_empty_dataset`,
 the `project_id` is not passed to it, which I think is the root cause of the 
error.
   
   > 
   
   ### What you expected to happen
   
   I expected the `project_id` to be passed to BQ's client from the arguments 
that I provide in the 
`airflow.providers.google.cloud.operators.bigquery.BigQueryCreateEmptyDatasetOperator`
  operator.
   
   ### How to reproduce
   
   Call the operator when the project ID is not set in the environment.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to