bryan824 opened a new pull request, #23141: URL: https://github.com/apache/airflow/pull/23141
Recently during migration from 1.10.14 to 2.2.3, I noticed an issue in the `BigQueryDeleteTableOperator`. For the context of this, there are two ways to specify a table in GCP BigQuery, one with the project_id, like `my-project.mydataset.mytable`, and the other one without project_id, like `mydataset.mytable`. In 1.10.14, I was using the version without project_id, because the table can be recognized by `BigQueryHook`, using `bigquery_conn_id` to fetch `project_id` in configuration. The path to pass this info is: [gcp_api_base_hook#L131](https://github.com/apache/airflow/blob/c743b95a02ba1ec04013635a56ad042ce98823d2/airflow/contrib/hooks/gcp_api_base_hook.py#L131) -> [gcp_api_base_hook#L200](https://github.com/apache/airflow/blob/c743b95a02ba1ec04013635a56ad042ce98823d2/airflow/contrib/hooks/gcp_api_base_hook.py#L200) -> [bigquery_hook#L71](https://github.com/apache/airflow/blob/c743b95a02ba1ec04013635a56ad042ce98823d2/airflow/contrib/hooks/bigquery_hook.py#L71) -> [bigquery_hook#L1498](https://github.com/apache/airflow/blob/c743b95a02ba1ec04013635a56ad042ce98823d2/airflow/contrib/hooks/bigquery_hook.py#L1498). But after upgrading to 2.2.3, a full `table_id` is required. This is unexpected because `bigquery_conn_id/gcp_conn_id` is still a valid parameter, `BigQueryDeleteTableOperator` should still be able to get `project_id` automatically from the connection configuration. It seems like in this line of code [bigquery#L1195](https://github.com/apache/airflow/blob/eb26510d3a1ccfaa9e4f8e1e0c91b5c74ae7393e/airflow/providers/google/cloud/hooks/bigquery.py#L1195), it forces users to use full `table_id` to create a `Table` instance, which is the **_root cause_**. Method `delete_table` accepts 4 types of tables, such as `Table`, `TableReference`, `TableListItem` and `str` as shown in [client#L1754](https://github.com/googleapis/python-bigquery/blob/c1d3e3089de1c267f8fb013283289b7d42172c76/google/cloud/bigquery/client.py#L1754). Then in [client#L1784](https://github.com/googleapis/python-bigquery/blob/c1d3e3089de1c267f8fb013283289b7d42172c76/google/cloud/bigquery/client.py#L1784), it converts these 4 types to 1 type, which is `TableReference` as shown in [table#L2689](https://github.com/googleapis/python-bigquery/blob/c1d3e3089de1c267f8fb013283289b7d42172c76/google/cloud/bigquery/table.py#L2689). So back to the possible improvement of this issue, I wonder if it will help migration get smoother if instead of using `Table.from_string` to get a `Table` type, a `str` type parameter is passed directly. And this `str` parameter can be just `mydataset.mytable`, with `project_id` set by the `Client` as shown in [bigquery#L1194](https://github.com/apache/airflow/blob/8dedd2ac13a6cdc0c363446985f492e0f702f639/airflow/providers/google/cloud/hooks/bigquery.py#L1194). I believe due to the plan of [GCP](https://cloud.google.com/composer/docs/composer-2/composer-versioning-overview#version-support-for-composer-1), companies are slowly migrating to Airflow 2.0 for better support. This improvement will avoid having them add the `project_id` to `table_id` for hundreds of DAGs since it is already included in the connection configuration. Below are two scenarios based on the two formats of specifying a BigQuery table: 1. `table_id` like `mydataset.mytable` is passed in [bigquery#L1797](https://github.com/apache/airflow/blob/8dedd2ac13a6cdc0c363446985f492e0f702f639/airflow/providers/google/cloud/operators/bigquery.py#L1797) and the corresponding `project_id` is configured by the connection. This will work as expected, if no `project_id` is found, error will be captured in [_helpers#L825](https://github.com/googleapis/python-bigquery/blob/c1d3e3089de1c267f8fb013283289b7d42172c76/google/cloud/bigquery/_helpers.py#L825). 2. `table_id` like `my-project.mydataset.mytable` is passed. In this case, whether or not the `project_id` is configured or configured correspondingly, it will use the `project_id` defined in the `table_id` regardless as shown in [_helpers#L836](https://github.com/googleapis/python-bigquery/blob/c1d3e3089de1c267f8fb013283289b7d42172c76/google/cloud/bigquery/_helpers.py#L836). This is my first attempt at submitting a PR to an open-sourced repo. Please let me know how I can improve. It is also fine if it is not worth merging such a change. I enjoyed the time when looking into this. @kaxil @eladkal @potiuk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
