ttippinyu commented on issue #44738:
URL: https://github.com/apache/airflow/issues/44738#issuecomment-2543013976
Hi, we have reproduced on the latest airflow version 2.10.3. So the issue
seems to be happen when the task id itself is less than 250, but the group id
prefix + task id is greater than 250. The reproduction steps are:
1. start `airflow standalone`
2. create a dag
```
with DAG(
dag_id="foo",
schedule=None,
start_date=datetime.datetime(2022, 3, 4),
catchup=False,
tags=["example", "params"],
) as dag:
with TaskGroup("A" * 20):
EmptyOperator(task_id="1" * 20)
```
3. trigger the dag
4. disable the dag
5. clear the task
6. update the dag so that the task_id is now `"1" * 240` (so that the full
task id will now be length 260 - with the `"A" * 20` group prefix
7. if we check the airflow metadb now, we will find that the task id in
`serialized_dag` is
`AAAAAAAAAAAAAAAAAAAA.111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111`,
which has length 261
8. enable the DAG, the scheduler will crash [at this
line](https://sourcegraph.com/github.com/apache/[email protected]/-/blob/airflow/models/baseoperator.py?L960)
9. fix the task id so it is short again
10. try to start scheduler by running `airflow scheduler` and the scheduler
will also have an exception when it calls `_run_scheduler_loop`
11. after a few attempts then `airflow scheduler` will succeed and the short
task id is serialized in the DB
From a glance, this seems to happen because in `BaseOperator` we validate
the task id without the group id prefix. However, in the metadb we store the
"full" task id, which includes the group id prefix. Hence, when
`_run_scheduler_loop` runs, it throws an exception.
Haven't been able to test on latest `main` but looking at the call of
`validate_key` in `BaseOperator`
[here](https://sourcegraph.com/github.com/apache/airflow/-/blob/task_sdk/src/airflow/sdk/definitions/baseoperator.py?L735),
seems like it only validates individual task id and not the full task id
(group id + task id).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]