vchiapaikeo commented on issue #32870:
URL: https://github.com/apache/airflow/issues/32870#issuecomment-1776238735

   There's actually a bit of inconsistency here and `project_id` doesn't always 
necessarily mean where BQ compute occurs. For example, with 
BigQueryUpsertTableOperator, the [project_id field refers to a storage 
location](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/hooks/bigquery.py#L1164C28-L1164C75).
 But in BigQueryInsertJobOperator, it [refers to where BQ compute 
occurs](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/operators/bigquery.py#L2685).
 
   
   Another part of my hesitancy to do this is that the 
[BigQueryHook](https://github.com/apache/airflow/blob/f457228bd21b13a7fdb29e154202d5357d024050/airflow/providers/google/cloud/hooks/bigquery.py#L79)
 itself does not expose project_id as one of its parameters. And, if you look 
through the hook code, the project_id parameter among the hook's methods does 
not necessarily refer to the compute project. For most methods, it is both the 
compute and storage project. So by changing this up for the 
BigQueryValueCheckOperator, we'd potentially be introducing even more 
inconsistency.
   
   I think the path forward here needs to be to deprecate many of these calls 
that are based on the discovery API and use the [BigQuery 
client](https://github.com/apache/airflow/blob/f457228bd21b13a7fdb29e154202d5357d024050/airflow/providers/google/cloud/hooks/bigquery.py#L151-L162)
 object which allows us to distinguish these two types of projects (compute and 
storage) more clearly. It's more of a refactor than a one time fix for this 
specific operator IMO. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to