luke-hoffman1 opened a new issue, #47587:
URL: https://github.com/apache/airflow/issues/47587

   ### Apache Airflow Provider(s)
   
   google, openlineage
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-google==10.26.0
   apache-airflow-providers-openlineage==2.1.0
   
   ### Apache Airflow version
   
   2.10.5
   
   ### Operating System
   
   macOS Sequoia Version 15.3.1 (24D70)
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   FROM quay.io/astronomer/astro-runtime:12.7.1
   
   ### What happened
   
   When I execute a CTAS statement that references a view, OpenLineage returns 
the underlying tables instead of the view name. I suspect this is because the 
BigQuery Job API provides the underlying table information more readily 
available than the view name itself. The only place I can find the view name is 
in the ⁠`configuration.query.query` property. It appears that the input tables 
are instead being retrieved from the ⁠`statistics.query.referencedTables` 
property.
   
   I believe this is the relevant 
[code](https://github.com/apache/airflow/blob/eb18f87f091116a9b7db5ae30fdb40f6e0a6377f/providers/google/src/airflow/providers/google/cloud/openlineage/mixins.py#L231)
   
   ### What you think should happen instead
   
   It would be beneficial to receive the view name as the OpenLineage input 
instead of the underlying table names, as this would ensure we capture the 
complete lineage.
   
   ### How to reproduce
   
   DAG:
   
   ```
   from airflow import DAG
   from airflow.providers.google.cloud.operators.bigquery import (
       BigQueryInsertJobOperator
   )
   from datetime import datetime
   
   dag = DAG(
       dag_id="dag_execute_bq_ctas",
       schedule_interval=None,
       start_date=datetime(2025, 3, 4),  # Start date
   )
   
   task1 = BigQueryInsertJobOperator(
       task_id="task1",
       gcp_conn_id="bq_conn",
       configuration={
           "query": {
               "query": f"CREATE OR REPLACE TABLE <bq-dataset>.table1 AS SELECT 
* FROM <bq-dataset>.<view-name>;",
               "useLegacySql": False,
               "priority": "BATCH",
           }
       },
       dag=dag,
   )
   
   task1
   ```
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to