TobiasHammarstrom opened a new issue, #35795: URL: https://github.com/apache/airflow/issues/35795
### Apache Airflow version Other Airflow 2 version (please specify below) ### What happened The function [run_grant_dataset_view_access](https://airflow.apache.org/docs/apache-airflow-providers-google/10.10.1/_api/airflow/providers/google/cloud/hooks/bigquery/index.html#airflow.providers.google.cloud.hooks.bigquery.BigQueryHook.run_grant_dataset_view_access) does not correctly check if the view already exists, breaking what's specified in the docs > If this view has already been granted access to the dataset, do nothing ### What you think should happen instead The function should skip trying to create the view if it already exists ### How to reproduce Use GCP Composer, create a DAG with a BigQueryHook, call run_grant_dataset_view_access to grant a view in dataset A access to dataset B, but the view already has access to dataset B. ### Operating System composer-2.5.1-airflow-2.6.3 ### Versions of Apache Airflow Providers The ones included in the above installation (the provider in question is google-cloud-bigquery==3.12.0) ### Deployment Google Cloud Composer ### Deployment details _No response_ ### Anything else Investigation: The logs include "Granting table xxx authorized view access to xxx dataset.", which indicates that [this if-statement](https://github.com/apache/airflow/blob/b71c14c74a2715009bbd2134a81d32d3f41f7e1e/airflow/providers/google/cloud/hooks/bigquery.py#L1129) is True. When running a python script locally with the same version of **google-cloud-bigquery** I found that the AccessEntry object [fetched from the dataset](https://github.com/apache/airflow/blob/b71c14c74a2715009bbd2134a81d32d3f41f7e1e/airflow/providers/google/cloud/hooks/bigquery.py#L1126) does not match the [created AccessEntry object](https://github.com/apache/airflow/blob/b71c14c74a2715009bbd2134a81d32d3f41f7e1e/airflow/providers/google/cloud/hooks/bigquery.py#L1120). The mismatch occurs in the _properties dict, where the fetched AccessEntry only contains one entry _view_ whereas the created one contains two entries, _view_ and _role_. The following script shows the problem ```python from google.cloud import bigquery from google.cloud.bigquery.dataset import AccessEntry from copy import deepcopy def main(): client = bigquery.Client(project=<project>, location=<location>) dataset = client.get_dataset(<dataset>) access_entries = dataset.access_entries view_access = AccessEntry( role=None, entity_type="view", entity_id={ "projectId": <project>, "datasetId": <dataset>, "tableId": <table> }) view_access_no_role = AccessEntry( role=None, entity_type="view", entity_id={ "projectId": <project>, "datasetId": <dataset>, "tableId": <table> }) del view_access_no_role._properties['role'] isInAccessEntries_v1 = view_access in access_entries #False isInAccessEntries_v2 = view_access_no_role in access_entries #True existing_view_access = access_entries[<index of existing view>] existing_view_access_fixed = deepcopy(existing_view_access) existing_view_access_fixed._properties['role'] = None access_entries.append(existing_view_access_fixed) isInAccessEntries_v3 = view_access in access_entries #True if __name__ == "__main__": main() ``` ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
