tatiana opened a new issue, #39486:
URL: https://github.com/apache/airflow/issues/39486

   ### Apache Airflow version
   
   2.9.1
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   Valid DAGs that worked in Airflow 2.8.x  and had tasks with outlets with 
specific URIs, such as 
`Dataset("postgres://postgres:5432/postgres.dbt.stg_customers")`, stopped 
working in Airflow 2.9.0, after #37005 was merged.
   
   ### What you think should happen instead?
   
   This is a breaking change in an Airflow minor version. We should avoid this.
   
   I believe Airflow < 3.0 should raise a warning, and from Airflow 3.0, we can 
make errors by default. We can have a feature flag to allow users who want to 
see this in advance to enable errors in Airflow 2. x, but this should not be 
the default behavior.
   
   The DAGs should continue working on Airflow 2.x minor/micro releases, 
without errors (unless the user opt-in via configuration).
   
   ### How to reproduce
   
   By running the following DAG, as an example:
   ```
   from datetime import datetime
   
   from airflow import DAG
   from airflow.datasets import Dataset
   from airflow.operators.empty import EmptyOperator
   
   
   
   with DAG(dag_id='empty_operator_example', start_date=datetime(2022, 1, 1), 
schedule_interval=None) as dag:
   
       task1 = EmptyOperator(
           task_id='empty_task1',
           dag=dag,
           
outlets=[Dataset("postgres://postgres:5432/postgres.dbt.stg_customers")]
       )
   
       task2 = EmptyOperator(
           task_id='empty_task2',
           dag=dag
       )
   
       task1 >> task2
   ```
   
   Causes to the exception:
   ```
   Broken DAG: [/usr/local/airflow/dags/example_issue.py]
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.11/site-packages/airflow/datasets/__init__.py", line 
81, in _sanitize_uri
       parsed = normalizer(parsed)
                ^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/airflow/providers/postgres/datasets/postgres.py",
 line 34, in sanitize_uri
       raise ValueError("URI format postgres:// must contain database, schema, 
and table names")
   ValueError: URI format postgres:// must contain database, schema, and table 
names
   ```
   
   ### Operating System
   
   All
   
   ### Versions of Apache Airflow Providers
   
   N/A
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   This issue can be seen in any Airflow 2.9.0 or 2.9.1 deployment.
   
   ### Anything else?
   
   Suggested approach to fix the problem:
   
   1. Introduce a boolean configuration within `[core],` named 
`strict_dataset_uri_validation,` which should be `False` by default.
   
   2. When this configuration is `False,` Airflow should raise a warning saying:
   ```
   From Airflow 3, Airflow will be more strict with Dataset URIs, and the URI 
xx will no longer be valid. Please, follow the expected standard as documented 
in XX.
   ```
   
   3. If this configuration is `True`, Airflow should raise the exception, like 
it does now in Airflow 2.9.0 and 2.9.1
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to