xBis7 opened a new pull request, #43941:
URL: https://github.com/apache/airflow/pull/43941

   <!--
    Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at
   
      http://www.apache.org/licenses/LICENSE-2.0
   
    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.
    -->
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   This PR is not a work in progress but I've set it as a draft because this is 
my first time contributing to Airflow and I'm not sure if there should be an 
AIP or a mailing discussion for such changes. I'd appreciate any feedback.
   
   ## Issue description
   
   related: https://github.com/apache/airflow/pull/40802
   
   There are some OpenTelemetry standard practices that help keep the usage 
consistent across multiple projects. According to those
   * when a root span starts, the trace id and the span id are generated 
randomly
   * while the span is active, the context is captured
     * injected into a carrier object which is a map or a python dictionary
   * the carrier with the captured context is propagated across services and 
used to create sub spans
   * the new sub-span extracts the parent context from the carrier and uses it 
to set the parent
     * the trace id is accessed from the carrier and the span id is generated 
randomly
   
   To explain more, this is what the flow of operations should be
   1. Dag run starts - start `dagrun` span
       i. Get the `dagrun` span context
   2. Start task - with the `dagrun` context, start `ti` span
       i. Get the `ti` span context
   4. With the `ti` span context create task-sub span
   5. Task finishes - `ti` span ends
   6. Dag run finishes - `dagrun` span ends 
   
   Airflow follows a different approach
   * trace and span ids are generated in a deterministic way from various 
properties  of the `dag_run` and the `task_instance`
   * the spans for a dag and its tasks, are generated after the run has finished
   
   This is the flow of the current implementation
   1. Dag run starts
   2. Tasks start
   3. Tasks finish
   4. Create spans for the tasks
   5. Dag run finishes
   6. Create span for the dag run
   
   The current approach makes it impossible to create spans from under tasks 
while using the existing airflow code. To achieve that, you need to use 
https://github.com/howardyoo/airflow_otel_provider which has to generate the 
same trace id and span id as airflow otherwise the spans won't be properly 
associated. This isn't easily maintainable and it also makes it hard for people 
that are familiar with otel but new to airflow, to start using the feature.
   
   These are some references to OpenTelemetry docs
   
   https://opentelemetry.io/docs/concepts/context-propagation/
   https://opentelemetry.io/docs/languages/python/propagation/
   https://www.w3.org/TR/trace-context/
   
   ## Implementation description
   
   A lot of people might already be using airflow with the existing otel 
implementation. To avoid any inconvenience, the changes are hidden behind a 
config flag.
   
   This patch is extending the existing implementation and not changing it. 
Once the flag is turned on, a new set of spans gets generated and exported. The 
new spans have the suffix `_ctx_prop` on their name.
   
   
![image](https://github.com/user-attachments/assets/13284c6a-8ee1-4335-b4f3-d0235b9cdcc9)
   
   I've reused the attributes from the original implementation. In addition, 
the timings are the same.
   
   * without context propagation 
   
![image](https://github.com/user-attachments/assets/f116a4a7-694a-4af9-bf8a-dfb8d51e120c)
   * with context propagation
   
![image](https://github.com/user-attachments/assets/300d027d-821e-4510-9d79-a82b64576acb)
   
   To be able to propagate the context of a span, the span must be active. 
   
   For example, for a dag run, the span can't be created at the end but 
   * it needs to start with the run
   * stay active so that the context can be captured and propagated
   * end once the run also ends
   
   Same goes for a task and any sub-spans.
   
   With this approach, we can use the new otel methods for creating spans 
directly from under a task without the need of the `airflow_otel_provider`. 
These spans will be children of the task span.
   
   Check `test_otel_dag.py` for an example of usage. 
   
   ## Testing
   
   I've added a unit test for the `otel_tracer` methods and an integration test 
that runs a dag and then asserts the parent child relationships for the dag 
span, the task spans and the task sub spans.
   
   I've also tested the changes manually with a `PythonVirtualenvOperator` 
without issues.
   
   <!-- Please keep an empty line above the dashes. -->
   ---
   **^ Add meaningful description above**
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to