xBis7 commented on PR #43941:
URL: https://github.com/apache/airflow/pull/43941#issuecomment-2512595217

   My initial approach wasn't considering scheduler HA. I've updated the patch 
accordingly.
   
   There have been two main challenges
   * Opentelemetry spans are designed so that only the process that starts 
them, can end them
     * The span objects can't be shared or stored to a db
   * The airflow philosophy for scheduler HA is that the only shared state 
between multiple schedulers is the db
     * It is very common that one scheduler starts a dag (also starts the span) 
and another scheduler finishes the dag (should end the span)
   
   To avoid breaking things, each scheduler will have a list of the spans that 
it started and will be solely responsible for ending them. We will save two new 
attributes on the `dagrun` table and the `ti` table.
   
   The new columns will be
   * `context_carrier`
       * this is used for propagating the context and creating sub spans
   * `span_status`
       * this is keeping track of the span status and notifying each scheduler 
of how to handle the span
   
   Possible scenarios with 2 scheduler (this can easily work with more)
   1. Scheduler1 starts a dag and finishes processing it
       * This is straight forward
         ```
         dag span
           |_ task1 span
                 |_ task1 sub span
           |_ task2 span
         ```
   2. Scheduler1 starts a dag while another scheduler finishes it
       * The visualized result will be the same as scenario 1
         ```
         dag span
           |_ task1 span
                 |_ task1 sub span
           |_ task2 span
         ```
       * scheduler2 will set the span status to `SHOULD_END` and scheduler1 
will end the spans during the next loop iteration.
   3. Scheduler1 starts a dag, exits gracefully and another scheduler picks up 
the dag and finishes it
       * Since scheduler1 exits gracefully, e.g. with a `SIGTERM` or `SIGINT` 
signal, we can end the spans and update the status
       * scheduler2 will create a continued span, for each prematurely ended 
span
          ```
          original       |----|
          continued           |-------|
   
         dag span
           |_ task1 span
           |_ scheduler exited span
           |_ new scheduler span
           |_ dag span (continued suffix)
                 |_ task1 span (continued suffix)
                       |_ task1 sub span
                 |_ task2 span
         ```
   4. Scheduler1 starts a dag, exits forcefully and another scheduler picks up 
the dag and finishes it 
       * In this case scheduler1 exited with active spans. Airflow has a 
standard way of declaring a scheduler unhealthy and also stores the id of the 
scheduler job that started a dagrun or a ti
       * If the scheduler that started the dagrun or the ti, is unhealthy, we 
can recreate the lost spans
       * If a task is active and running and its part that hasn't been executed 
yet, is supposed to create some sub-spans, then it will use the new recreated 
span as a parent
         ```
         dag span (recreated suffix)
           |_ task1 span (recreated suffix)
                 |_ task1 sub span
           |_ task2 span
         ```
       * If a task has finished and it created some sub-spans, then those spans 
are referencing a parent span that is lost along with the unhealthy scheduler. 
The only way to get the sub-spans is to re-run the task. In that case, we are 
recreating the span of the task itself but we can't recreate the sub-spans 
without rerunning the task
         ```
         dag span (recreated suffix)
           |_ task1 span (recreated suffix)
           |_ task2 span
         ```
   
   I'll do some testing to make sure nothing has been broken from merging with 
main and I'll move the PR out of draft.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to