vchiapaikeo commented on issue #32870:
URL: https://github.com/apache/airflow/issues/32870#issuecomment-1775458002
> Can you please advise how to get gcp_conn_id? I only used default value so
far. Thanks!
So here's a simple dag example:
```py
from airflow import DAG
from airflow.providers.google.cloud.operators.bigquery import
BigQueryValueCheckOperator
DEFAULT_TASK_ARGS = {
"owner": "gcp-data-platform",
"start_date": "2023-03-13",
"retries": 1,
"retry_delay": 300,
}
with DAG(
schedule_interval="@daily",
max_active_runs=1,
max_active_tasks=5,
catchup=False,
dag_id="test_bigquery_value_check",
default_args=DEFAULT_TASK_ARGS,
) as dag:
value_check_on_same_project_without_impersonation =
BigQueryValueCheckOperator(
task_id="value_check_on_same_project_without_impersonation",
sql=f"select count(1) from `airflow-vchiapaikeo.test.table1`",
pass_value=1,
tolerance=0.15,
use_legacy_sql=False,
location="US",
gcp_conn_id="google_cloud_default",
# deferrable=True,
#
impersonation_chain=["[email protected]"],
)
value_check_on_diff_project_with_impersonation =
BigQueryValueCheckOperator(
task_id="value_check_on_diff_project_without_impersonation_expect_fail",
sql=f"select count(1) from `airflow2-vchiapaikeo.test.table1`",
pass_value=1,
tolerance=0.15,
use_legacy_sql=False,
location="US",
gcp_conn_id="google_cloud_default2",
# deferrable=True,
impersonation_chain=["[email protected]"],
)
```
I define two different gcp_conn_ids. One w/ project A and the other w/
project B.
<img width="1440" alt="image"
src="https://github.com/apache/airflow/assets/9200263/def44dd2-99b0-42cf-aaf1-074fae41bb7f">
You can see the second operator gets executed in the correct project and
with the correct service account here:
<img width="1299" alt="image"
src="https://github.com/apache/airflow/assets/9200263/ea2c7d1d-2267-454d-a6a1-92f16adc58e9">
> And as I read the source code of other operators, they use a hook to pass
impersonation chain, and send the request via the hook, instead of send the
request directly. I guess this might be the reason? Is that possible to use a
hook as well in this operator as well?
This is a little complicated actually and I don't totally understand all of
it. Part of the hook uses the [soon to be deprecated discovery
API](https://github.com/apache/airflow/blob/789222cb1378079e2afd24c70c1a6783b57e27e6/airflow/providers/google/cloud/hooks/bigquery.py#L149)
and the other part uses the [BigQuery
client](https://github.com/apache/airflow/blob/789222cb1378079e2afd24c70c1a6783b57e27e6/airflow/providers/google/cloud/hooks/bigquery.py#L37).
The part that uses the discovery api infers the project id from the
gcp_conn_id connection. The common code shared among the DbApiHook probably
needs to be refactored to move away from the discovery API and to use BigQuery
client... but it will be quite difficult 😓 . Please correct me if I am wrong,
anybody that knows this code better than I do.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]