vchiapaikeo commented on issue #32870: URL: https://github.com/apache/airflow/issues/32870#issuecomment-1776238735
There's actually a bit of inconsistency here and `project_id` doesn't always necessarily mean where BQ compute occurs. For example, with BigQueryUpsertTableOperator, the [project_id field refers to a storage location](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/hooks/bigquery.py#L1164C28-L1164C75). But in BigQueryInsertJobOperator, it [refers to where BQ compute occurs](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/operators/bigquery.py#L2685). Another part of my hesitancy to do this is that the [BigQueryHook](https://github.com/apache/airflow/blob/f457228bd21b13a7fdb29e154202d5357d024050/airflow/providers/google/cloud/hooks/bigquery.py#L79) itself does not expose project_id as one of its parameters. And, if you look through the hook code, the project_id parameter among the hook's methods does not necessarily refer to the compute project. For most methods, it is both the compute and storage project. So by changing this up for the BigQueryValueCheckOperator, we'd potentially be introducing even more inconsistency. I think the path forward here needs to be to deprecate many of these calls that are based on the discovery API and use the [BigQuery client](https://github.com/apache/airflow/blob/f457228bd21b13a7fdb29e154202d5357d024050/airflow/providers/google/cloud/hooks/bigquery.py#L151-L162) object which allows us to distinguish these two types of projects (compute and storage) more clearly. It's more of a refactor than a one time fix for this specific operator IMO. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
