alexkruc opened a new pull request, #25280:
URL: https://github.com/apache/airflow/pull/25280
Based on the discussion on #24728, this is the PR to allow the users to add
their own hyperlinks to the `owner` property.
The idea behind this PR is as follows -
Similar to what we do with DAG tags, I've created a separate table in the DB
to store the links with the following schema:`dag_id :: owner_name ::
owner_link`.
Before I reached this implementation, I tried saving the dict in the
current `owners` column, but I faced challenges and was convinced that it was
not the right way of implementation. The tradeoffs, as I see them are:
1. The current `owners` field is a property of a DAG that contains all of
them. This is a string column. To add the new implementation there, I had to
save as strings the following object: `{owner1: link1}, {owner2:link2}`. This
is not native to work with, as it's a list of dicts inside a string column.
Also, delimiter problems are also occurring here, and if we change the
delimiter of the `owners` field (`,` -> `;` for example), it breaks a lot of
the current flow.
2. Link length - The links (especially `mailto` links that can contain a
message template) can be long. The current `owners` column length is `2000`,
and in DAGs with multiple owners, it can be a problem. By moving this logic to
its own table, we can manage the link length independently.
3. Separation of concerns - Only the `views.py` file and the `DagModel` (for
the orm bulk write) needs to be aware of owner links, all the other components
can keep getting their `owners` field in the same way they are used to :)
This PR was tested on the following flows manually using local Airflow
started by `breeze`:
1. Using multiple DBs to verify the schema migration (SQLite, Postgres),
started by `--backend` param to `breeze`.
2. Verified that the new table is getting records added, edited, and deleted
based on live DAG edits.
3. Verified that in the DAGs view page, owners with valid links are
transferred into a hyperlink and opened in a new tab for quality of life :)
4. Bad formatting of URLs, and potential exploits (calling hyperlink with
`javascript:alert(1)` for example), are not getting loaded into the system and
thrown out with `AirflowException`.
5. Added `Owner Links` property to `Dag Details` page that shows the owner
links, and `None` if no links existing in the DAG.
The DAGs I've used to test it (set in the local `files/dags/` folder):
1. dag_with_owner_links.py
```python
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
owner_obj = {'name': "test_owner_link", "link":
"https://www.google.com/search?q=blabla"}
default_args = {'owner': owner_obj, "start_date": datetime(2021, 9, 9),
"retries": 1,
'execution_timeout': timedelta(minutes=30)}
dag = DAG("test_dag_with_links", default_args=default_args,
schedule_interval=None,
catchup=False)
with dag:
task = BashOperator(task_id='task_with_link', bash_command='echo Hello')
task2 = BashOperator(task_id='task_with_no_link', bash_command='echo
"Hello Again"', owner="bla")
task3 = BashOperator(task_id='task_with_another_link',
bash_command='echo "Hello Again"',
owner={'name': "test_owner_email",
'link':
"mailto:[email protected]?subject=Mail from Our Site"})
```
2. dag_with_no_links.py
```python
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
default_args = {'owner': "test_owner_link", "start_date": datetime(2021, 9,
9), "retries": 1,
'execution_timeout': timedelta(minutes=30)}
dag = DAG("test_dag_with_no_links", default_args=default_args,
schedule_interval=None,
catchup=False)
with dag:
task = BashOperator(task_id='hello', bash_command='echo Hello')
task2 = BashOperator(task_id='hello_again', bash_command='echo "Hello
Again"', owner="bla")
task3 = BashOperator(task_id='hello_again_2',
bash_command='echo "Hello Again"',
owner="blabla")
```
3. bad_link_format.py
```python
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
owner_obj = {'name': "test_owner_link", "link": "wrong-website-format@bla"}
default_args = {'owner': owner_obj, "start_date": datetime(2021, 9, 9),
"retries": 1,
'execution_timeout': timedelta(minutes=30)}
dag = DAG("test_dag_with_bad_formatted_links", default_args=default_args,
schedule_interval=None,
catchup=False)
with dag:
task = BashOperator(task_id='hello', bash_command='echo Hello')
```
4. dag_with_link_explot.py
```python
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
owner_obj = {'name': "test_owner_link", "link": "javascript:alert(1)"}
default_args = {'owner': owner_obj, "start_date": datetime(2021, 9, 9),
"retries": 1,
'execution_timeout': timedelta(minutes=30)}
dag = DAG("test_dag_with_exploited_link", default_args=default_args,
schedule_interval=None,
catchup=False)
with dag:
task = BashOperator(task_id='hello', bash_command='echo Hello')
```
closes: #24728
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]