alexkruc opened a new pull request, #25280:
URL: https://github.com/apache/airflow/pull/25280

   Based on the discussion on #24728, this is the PR to allow the users to add 
their own hyperlinks to the `owner` property.
   
   The idea behind this PR is as follows - 
   Similar to what we do with DAG tags, I've created a separate table in the DB 
to store the links with the following schema:`dag_id :: owner_name :: 
owner_link`. 
    Before I reached this implementation, I tried saving the dict in the 
current `owners` column, but I faced challenges and was convinced that it was 
not the right way of implementation. The tradeoffs, as I see them are:
   1. The current `owners` field is a property of a DAG that contains all of 
them. This is a string column. To add the new implementation there, I had to 
save as strings the following object: `{owner1: link1}, {owner2:link2}`. This 
is not native to work with, as it's a list of dicts inside a string column. 
Also, delimiter problems are also occurring here, and if we change the 
delimiter of the `owners` field (`,` -> `;` for example), it breaks a lot of 
the current flow.
   2.  Link length - The links (especially `mailto` links that can contain a 
message template) can be long. The current `owners` column length is `2000`, 
and in DAGs with multiple owners, it can be a problem. By moving this logic to 
its own table, we can manage the link length independently. 
   3. Separation of concerns - Only the `views.py` file and the `DagModel` (for 
the orm bulk write) needs to be aware of owner links, all the other components 
can keep getting their `owners` field in the same way they are used to :) 
   
   This PR was tested on the following flows manually using local Airflow 
started by `breeze`:
   1. Using multiple DBs to verify the schema migration (SQLite, Postgres), 
started by `--backend` param to `breeze`.
   2. Verified that the new table is getting records added, edited, and deleted 
based on live DAG edits.
   3. Verified that in the DAGs view page, owners with valid links are 
transferred into a hyperlink and opened in a new tab for quality of life :) 
   4. Bad formatting of URLs, and potential exploits (calling hyperlink with 
`javascript:alert(1)` for example), are not getting loaded into the system and 
thrown out with `AirflowException`.
   5. Added `Owner Links` property to `Dag Details` page that shows the owner 
links,  and `None` if no links existing in the DAG.
   
   The DAGs I've used to test it (set in the local `files/dags/` folder):
   1. dag_with_owner_links.py
   ```python
   from datetime import datetime, timedelta
   from airflow import DAG
   from airflow.operators.bash import BashOperator
   
   owner_obj = {'name': "test_owner_link", "link": 
"https://www.google.com/search?q=blabla"}
   
   default_args = {'owner': owner_obj, "start_date": datetime(2021, 9, 9), 
"retries": 1,
                   'execution_timeout': timedelta(minutes=30)}
   dag = DAG("test_dag_with_links", default_args=default_args, 
schedule_interval=None,
             catchup=False)
   
   with dag:
       task = BashOperator(task_id='task_with_link', bash_command='echo Hello')
       task2 = BashOperator(task_id='task_with_no_link', bash_command='echo 
"Hello Again"', owner="bla")
       task3 = BashOperator(task_id='task_with_another_link',
                            bash_command='echo "Hello Again"',
                            owner={'name': "test_owner_email",
                                   'link': 
"mailto:[email protected]?subject=Mail from Our Site"})
   ```
   
   2. dag_with_no_links.py
   ```python
   from datetime import datetime, timedelta
   from airflow import DAG
   from airflow.operators.bash import BashOperator
   
   
   default_args = {'owner': "test_owner_link", "start_date": datetime(2021, 9, 
9), "retries": 1,
                   'execution_timeout': timedelta(minutes=30)}
   dag = DAG("test_dag_with_no_links", default_args=default_args, 
schedule_interval=None,
             catchup=False)
   
   with dag:
       task = BashOperator(task_id='hello', bash_command='echo Hello')
       task2 = BashOperator(task_id='hello_again', bash_command='echo "Hello 
Again"', owner="bla")
       task3 = BashOperator(task_id='hello_again_2',
                            bash_command='echo "Hello Again"',
                            owner="blabla")
   ```
   
   3. bad_link_format.py
   ```python
   from datetime import datetime, timedelta
   from airflow import DAG
   from airflow.operators.bash import BashOperator
   
   owner_obj = {'name': "test_owner_link", "link": "wrong-website-format@bla"}
   
   default_args = {'owner': owner_obj, "start_date": datetime(2021, 9, 9), 
"retries": 1,
                   'execution_timeout': timedelta(minutes=30)}
   dag = DAG("test_dag_with_bad_formatted_links", default_args=default_args, 
schedule_interval=None,
             catchup=False)
   
   with dag:
       task = BashOperator(task_id='hello', bash_command='echo Hello')
   ```
   
   4. dag_with_link_explot.py
   ```python
   from datetime import datetime, timedelta
   from airflow import DAG
   from airflow.operators.bash import BashOperator
   
   owner_obj = {'name': "test_owner_link", "link": "javascript:alert(1)"}
   
   default_args = {'owner': owner_obj, "start_date": datetime(2021, 9, 9), 
"retries": 1,
                   'execution_timeout': timedelta(minutes=30)}
   dag = DAG("test_dag_with_exploited_link", default_args=default_args, 
schedule_interval=None,
             catchup=False)
   
   with dag:
       task = BashOperator(task_id='hello', bash_command='echo Hello')
   ```
   
   
   closes: #24728
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to