I like the vision and the effect - even though we do not have much experiences with OTEL, I saw what it can do and I like what it "provides". And I found it very appealing when I saw it in "action" during some talks.
Maybe it would be useful to see a somewhat realistic example of what can be done with it? Showing what a user could do in DAG and screenshotting (or making up how it could look like)? I think that will give others a quick way of assessing what users will get out of it. But I believe there is no "fundamental" issue with adding an OTEL provider - as this is an open industry standard, popular and getting even more popular nowadays. I like how it fits into "Airflow as a platform" approach - especially that it will be a provider, not "core" modification. J. On Wed, Jul 31, 2024 at 9:28 PM Howard Yoo <howard...@gmail.com> wrote: > As part of AIP-49, tasks logs are actually included, as span events. > It can be turned on/off but when it's enabled, the content of the task log > will be included as span events and emitted. > Since task logs can be quite a large size, depending on the OTEL backend, > the log contents can be truncated as necessary. > > OTEL instrumented DAG would create a trace flow of DAG run, based on when > and what task instances got invoked and executed, so much similar to the > flow chart of Airflow, you can monitor the DAG's execution via any > OpenTelemetry compatible backend. Aside from the DAG run, there will be a > span-link that would connect a particular task instance run with the > instance (or loop) of scheduler, such that you would also know the > relationship between what was happening on the scheduler's side and the DAG > run itself - connected together. Instrumentation would also allow users to > attach or make custom attributes or spans as part of this DAG run graph > (which can be achieved easily using OTEL provider), such that not only they > can see the OOTB dag run graph, but have their own spans and > instrumentation be part of it, to make the monitoring more enriched. > > Utilizing these data, it is hoped that with larger Airflow environment, it > may be easier to monitor any failures, or incidents, or any performance > issues much better since the trace, metrics, and logs, will be able to be > collected into the single place, and easily correlated. > > On Thu, Jul 25, 2024 at 5:45 PM Vikram Koka <vik...@astronomer.io.invalid> > wrote: > > > Howard, > > > > I am intrigued by this, but unclear on what this would actually look like > > and what benefits it would add. > > > > Specifically, I believe that AIP-49 adds support for OTEL emission of > > metrics and traces, but NOT task logs from Airflow. > > > > I am probably being dense here, but I don't quite understand what the > OTEL > > instrumented DAG would look like, and how/when the instrumentation would > be > > utilized. Can you please elaborate on this? > > > > Best regards, > > Vikram > > > > > > On Wed, Jul 24, 2024 at 12:51 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > > > > > @howard: What sort of Operators or Hooks are you planning for the OTEL > > > provider? > > > > > > I am favour of deeper integration for OTEL and Airflow but I don't know > > > what Operators, Hooks or other things will be part of the provider. > > > > > > Regards, > > > Kaxil > > > > > > On Wed, 24 Jul 2024 at 01:00, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > > +1. I like the idea of how it will add a possibility to customize > OTEL > > > > metrics and spans possibly. With Airflow 2.10 I would also love to > see > > > some > > > > guidelines and description and maybe some kind of simple How-TO on > how > > > you > > > > can make "more" use of OTEL - for example users could use > > > > auto-instrumentation for sqlalchemy, flask and other libraries we are > > > > using, monitoring memory, cpu, processes etc. (this is all > > out-of-the-box > > > > available in OTEL) - and if such documentation describing a number of > > > > options and what the users can do about it would be great - and > > provider > > > > seems to be a good place maybe even to have some ways to enable those > > > > things more easily. > > > > > > > > Maybe just loosely related - but one thing that I am particularly > > looking > > > > forward to - is the ability for our users to be able to make a > snapshot > > > of > > > > a problem they see (with traces) and send it to us. I know Jaegger > has > > > such > > > > an option, and I saw what you can do, especially if you capture a lot > > of > > > > information, this would be the way we always tell all our users "But > I > > > have > > > > no way to inspect your system - so I can't tell you what is wrong" - > > > having > > > > such a snapshot that you can load locally especially with a lot of > auto > > > > instrumentation enabled might be fantastic way to help our users - > but > > in > > > > order to do that - we need to give them some > > easy-to-follow-instructions. > > > > > > > > If that would be part of the work then I am even +10 on that. > > > > > > > > J. > > > > > > > > On Tue, Jul 23, 2024 at 2:16 AM Howard Yoo <howard...@gmail.com> > > wrote: > > > > > > > > > Hi Apache Airflow Community, > > > > > > > > > > I hope this message finds you well. > > > > > > > > > > I am writing to propose the addition of a new provider to Apache > > > Airflow > > > > > for OpenTelemetry (https://opentelemetry.io). OpenTelemetry is an > > > > emerging > > > > > standard for instrumentation of services and applications, and > > recently > > > > has > > > > > matured to gain huge popularity. > > > > > > > > > > Recently, there has been AIP (Airflow Improvement Proposal) no. 49 > to > > > > > implement OpenTelemetry support for Apache Airflow, which will > enable > > > > > Airflow to be able to emit metrics, traces, and task logs in > > > > OpenTelemetry > > > > > (PRs: https://github.com/apache/airflow/pull/37948, > > > > > https://github.com/apache/airflow/pull/40802) > > > > > > > > > > Since this feature is to be released soon to the future Airflow, > > having > > > > > this provider will further allow users to have more means to > > instrument > > > > > their DAGs. This OTEL provider can work independently from > Airflow's > > > OTEL > > > > > implementation, as well as in conjunction if the feature is > available > > > and > > > > > enabled. Any DAGs instrumented with OTEL provider will work with > > > Airflow > > > > > versions that may not have OTEL support, but also seamlessly with > > > Airflow > > > > > that supports OTEL, providing OTEL for everybody. > > > > > > > > > > I am willing to contribute to the development and integration > effort > > to > > > > > ensure a smooth and effective implementation. Please let me know if > > > there > > > > > are any specific guidelines or processes that I should follow to > > > initiate > > > > > this proposal. > > > > > > > > > > Thanks and regards, > > > > > Howard Yoo > > > > > > > > > > > > > > >