xBis7 commented on PR #43941:
URL: https://github.com/apache/airflow/pull/43941#issuecomment-2512595217
My initial approach wasn't considering scheduler HA. I've updated the patch
accordingly.
There have been two main challenges
* Opentelemetry spans are designed so that only the process that starts
them, can end them
* The span objects can't be shared or stored to a db
* The airflow philosophy for scheduler HA is that the only shared state
between multiple schedulers is the db
* It is very common that one scheduler starts a dag (also starts the span)
and another scheduler finishes the dag (should end the span)
To avoid breaking things, each scheduler will have a list of the spans that
it started and will be solely responsible for ending them. We will save two new
attributes on the `dagrun` table and the `ti` table.
The new columns will be
* `context_carrier`
* this is used for propagating the context and creating sub spans
* `span_status`
* this is keeping track of the span status and notifying each scheduler
of how to handle the span
Possible scenarios with 2 scheduler (this can easily work with more)
1. Scheduler1 starts a dag and finishes processing it
* This is straight forward
```
dag span
|_ task1 span
|_ task1 sub span
|_ task2 span
```
2. Scheduler1 starts a dag while another scheduler finishes it
* The visualized result will be the same as scenario 1
```
dag span
|_ task1 span
|_ task1 sub span
|_ task2 span
```
* scheduler2 will set the span status to `SHOULD_END` and scheduler1
will end the spans during the next loop iteration.
3. Scheduler1 starts a dag, exits gracefully and another scheduler picks up
the dag and finishes it
* Since scheduler1 exits gracefully, e.g. with a `SIGTERM` or `SIGINT`
signal, we can end the spans and update the status
* scheduler2 will create a continued span, for each prematurely ended
span
```
original |----|
continued |-------|
dag span
|_ task1 span
|_ scheduler exited span
|_ new scheduler span
|_ dag span (continued suffix)
|_ task1 span (continued suffix)
|_ task1 sub span
|_ task2 span
```
4. Scheduler1 starts a dag, exits forcefully and another scheduler picks up
the dag and finishes it
* In this case scheduler1 exited with active spans. Airflow has a
standard way of declaring a scheduler unhealthy and also stores the id of the
scheduler job that started a dagrun or a ti
* If the scheduler that started the dagrun or the ti, is unhealthy, we
can recreate the lost spans
* If a task is active and running and its part that hasn't been executed
yet, is supposed to create some sub-spans, then it will use the new recreated
span as a parent
```
dag span (recreated suffix)
|_ task1 span (recreated suffix)
|_ task1 sub span
|_ task2 span
```
* If a task has finished and it created some sub-spans, then those spans
are referencing a parent span that is lost along with the unhealthy scheduler.
The only way to get the sub-spans is to re-run the task. In that case, we are
recreating the span of the task itself but we can't recreate the sub-spans
without rerunning the task
```
dag span (recreated suffix)
|_ task1 span (recreated suffix)
|_ task2 span
```
I'll do some testing to make sure nothing has been broken from merging with
main and I'll move the PR out of draft.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]