Re: [DISCUSS] Airflow error codes (AERR) design

Jarek Potiuk Tue, 12 May 2026 12:09:22 -0700

My take:

1. I see no problem with that
2. If we can find a good skill for it - yes - but I hardly see what skill
that would be :)
3. Yes, the description could simply be simply part of the base exception,
and we could generate documentation from it.
4. I have no opinion here - but it should be consistent.


J,


On Tue, May 12, 2026 at 8:51 PM Omkar P <[email protected]> wrote:

> Hi team,
>
> Thanks for your inputs. Design suggestions in this discussion so far:
>
> 1. Exceptions should have error code embedded inside them
> 2. Error code to Exception is to be a 1:1 mapping
> 3. Error code to be derived automatically from error class name e.g.
> AirflowDagNotFoundBackfillException -> AERR-DAG-NOT-FOUND-BACKFILL, but
> could be with some semantic grouping e.g. SQLAlchemy style
> 4. YAML mapping to be removed as 1 to 3 above simplifies design
>
> I'll look into updating PR as per above points. Meanwhile, there are
> some open questions, would be great if you could help think through:
>
> 1. Some exception types the user will never see e.g. DagCodeNotFound.
> Should we have error codes for such exceptions as well?
> 2. For SKILL file since we see a use case, shall we have an optional
> skill for error codes, in case someone would like to use it?
> 3. With YAML mapping removed, we need to decide whether to keep the
> error description, first steps, docs link inside the exception class or
> someplace else. Would keeping it inside the exception class have any
> drawbacks?
> 4. Is deriving error code directly from exception class name
> (AirflowNotFoundException -> AERR-NOT-FOUND) the best way to group? Or
> could we group in a more semantic way, like component-wise or similar?
> e.g. AERR/SCHED/DAG/NOTFOUND essentially saying DAG was not found by
> or in scheduler
>
> Regards,
> Omkar
>
> On Mon, May 11, 2026 at 9:30 PM Jens Scheffler <[email protected]>
> wrote:
>
> > Hi,
> >
> > +1 to Jarek and Ash, while I generally like the idea I#d favor _not_
> > needing a manuayl mapping in YAML and no code lookup table.
> >
> > Assuming for 95% of cases an 1:1 error code to exception mapping is
> > reasonable. If there are 5% of cases then it might be pretty easy to
> > split exceptions or adding a manualy code for these special cases. But
> > all majority would be great if zero maintenance. Automated mapping from
> > Exception class to error code seems reasonable.
> >
> > And for sure very very important would be to be able to support
> > Providers in general. If this is only in core then it would be
> > half-baked. Most exceptions in real life hopefully are generated in
> > providers.
> >
> > Jens
> >
> > On 11.05.26 16:35, Jarek Potiuk wrote:
> > > Ah ... and one good thing about the auto-mapping idea. You know that
> > > saying: T*he world is a slightly better place with every single line of
> > > yaml removed or not even created in the first place. *This is almost
> > > literally the quote from our "Monorepo" talk with Amogh in Talk Python
> To
> > > Me :).
> > >
> > > On Mon, May 11, 2026 at 4:33 PM Jarek Potiuk <[email protected]> wrote:
> > >
> > >> 1. Agree with the grouping idea. I think even originally when you
> > >> discussed it Omkar - there were some "groups" of exceptions.
> > >> AERR-DAG-NOTFOUND-BACKFILL seems like a more suitable short name than
> > 0001,
> > >> provided it is descriptive enough for you to easily understand what
> each
> > >> error means. I would hate always having to look up the error code in a
> > >> table or YAML file. We coud have such table generated and in docs, but
> > >> essentially after seeing enough logs you should know what the short
> code
> > >> means without memorizing the number. It's almost inhuman to force
> > people to
> > >> associate numeric values with meaning.
> > >>
> > >> 2. I think 1-1 mapping exception to the code would be good. While a
> > short
> > >> error code is useful in logs, seeing the short name in the code when
> you
> > >> "raise" them is counterproductive because it adds noise to something
> we
> > >> already have: the Exception Class name. On the other hand, such a
> class
> > >> name looks way worse in the logs./
> > >>
> > >> 3. *Idea:* Why don't we just keep the correct naming convention for
> our
> > >> Exceptions and map them into IDs automatically (e.g.,
> > >> AirflowDagNotFoundBackfillException -> AERR-DAG-NOT-FOUND-BACKFILL). I
> > >> think it ticks all the boxes:
> > >>
> > >> * 0 maintenance (just a hook to check if all exceptions follow the
> right
> > >> conventions
> > >> * 0 mapping
> > >> * Code friendly
> > >> * Log friendly
> > >> * You see what you get by looking at either the exception class or ID
> > >> * We can build an exception hierarchy that allows us to catch several
> > >> exceptions (e.g., `AirflowDagNotFoundException` being an abstract
> > >> (non-instantiable) parent of AirflowDagNotFoundBackfillExceptions and
> > >> AirflowDagNotFoundParsingException for example
> > >> * Grouping works naturally and without conscious thought—in both
> > exception
> > >> classes and IDs
> > >>
> > >> Essentially, no SKILL is needed for that.
> > >>
> > >> And BTW. I think none of our "coding" should really "Requiire" using
> > >> SKILLS and "impair" those who do not use agents. Even though I'm known
> > as
> > >> an AI and Agent enthusiast, we should avoid making standard code parts
> > or
> > >> development workflows inaccessible to those who don't want to use
> > agents,
> > >> especially if it's easy.
> > >>
> > >> It's one thing to empower maintainers and contributors with SKILLS to
> > >> review or triage PRs if they want to or for someone doing translation
> to
> > >> add a new phrase in a language. However, it's a different story when
> > >> discussing basic "code" tasks, like adding new exceptions. Ideally,
> > those
> > >> tasks should not **require** you to use Agents or be "difficult"
> without
> > >> them. We should totally respect people who choose not to use agents
> > >> themselves and ensure they do not feel like "lesser" people. Promoting
> > >> something and giving people new tools is one thing; making it a
> > mandatory
> > >> part of the regular workflow when it isn't truly required is another.
> > >>
> > >> J.
> > >>
> > >>
> > >>
> > >> On Mon, May 11, 2026 at 3:30 PM Ash Berlin-Taylor <[email protected]>
> > wrote:
> > >>
> > >>> Maybe we should not have sequential IDs at all and do something
> similar
> > >>> to what SQLA does: https://sqlalche.me/e/20/xd2s for example (That’s
> > >>> `/e/<major><minor>/<code>` which redirects)
> > >>>
> > >>> Some of the example(?) errors are internal to a single component and
> > >>> never exposed to users, so shouldn’t be in the registry -
> > >>> AERR009/DagCodeNotFound for instance, is likely thrown by the ORM
> > layer and
> > >>> caught by the API server, which is to say it is entirely invisible to
> > the
> > >>> user? I imagine there are many more in this category.
> > >>>
> > >>>
> > >>> AERR010 and AERR011 are both DagNotFound, but 11 is specifically for
> > >>> "Requested DAG could not be found for backfill operation” — that
> seems
> > very
> > >>> odd to have a different error code for that.
> > >>>
> > >>> We also have provider specific error codes in the main registry which
> > >>> isn’t a pattern that will work (`user_facing_error_message: Google
> Ads
> > link
> > >>> not found for the specified property`) etc.
> > >>>
> > >>> -ash
> > >>>
> > >>>
> > >>>> On 11 May 2026, at 14:20, Ash Berlin-Taylor <[email protected]> wrote:
> > >>>>
> > >>>> If we do this (and I’m still not sure what I think overall) +1 to
> some
> > >>> kind of grouping. Right now for instance the registry has AERR002 for
> > >>> connection not found, but no space to add  Variable not found, or
> > State not
> > >>> found in the future.
> > >>>>> On 11 May 2026, at 12:25, Dev-iL <[email protected]> wrote:
> > >>>>>
> > >>>>> (please assume there's a "In my opinion, " prefix to every
> sentence)
> > >>>>>
> > >>>>> 0. Since the dev workflow is very structured, it can/should be made
> > >>> into a
> > >>>>> SKILL.
> > >>>>> 1. Long term yes, but while we refactor the existing code we should
> > >>> allow
> > >>>>> it (assuming it trip hooks or CI)
> > >>>>> 2. YAML seems suitable at first glance
> > >>>>> 3. One code per exception makes sense to me. Depending on how we
> want
> > >>> the
> > >>>>> exception taxonomy to evolve, perhaps we want to have codes like
> > >>> ###.###
> > >>>>> for "parent" and "subclass" exceptions, or Ruff-style #00 will be a
> > >>> family
> > >>>>> of similar exceptions.
> > >>>>>
> > >>>>>
> > >>>>> On Mon, 11 May 2026, 12:15 Omkar P, <[email protected]>
> wrote:
> > >>>>>
> > >>>>>> Hi team,
> > >>>>>>
> > >>>>>> Starting this thread to discuss the design of Airflow error codes.
> > >>> These
> > >>>>>> are LLM-friendly strings starting with AERR, which airflow devs
> can
> > >>> use
> > >>>>>> when raising exceptions, to convey the error context to dag users
> > in a
> > >>>>>> succinct way. Providing current design details below.
> > >>>>>>
> > >>>>>> PR: https://github.com/apache/airflow/pull/65423
> > >>>>>>
> > >>>>>> Feature flow:
> > >>>>>> 1. airflow dev identifies error case and defines a new error code
> in
> > >>> the
> > >>>>>> error mapping yaml (say AERR002).
> > >>>>>> 2. dev then adds AirflowErrorCodeMixin to respective exception
> class
> > >>>>>> that they'd want to raise with an error_code.
> > >>>>>> 3. dev then specifies the error_code in raise in code (e.g.  raise
> > >>>>>> AirflowNotFoundException(..., error_code="AERR002")).
> > >>>>>> 4. dev runs breeze build-docs that generates a new docs page
> > >>> AERR002.rst
> > >>>>>> 5. breeze static check takes care of validating if error code is
> > >>> mapped
> > >>>>>> to correct exception class.
> > >>>>>>
> > >>>>>> User side:
> > >>>>>> On airflow users' side, they now see airflow error code as
> > >>>>>> part of the stack trace, which they can use for communicating
> > problems
> > >>>>>> instead of pasting verbose stack traces. Error codes also improve
> > >>>>>> LLM-based discovery of airflow errors as codes are much more
> > >>>>>> deterministic/well-defined than plain stack traces.
> > >>>>>>
> > >>>>>> Open questions:
> > >>>>>> 1. Should the error code be mandatory for all raises of an
> exception
> > >>>>>> class that uses them?
> > >>>>>> 2. Where should the error code info be stored? Is a YAML-based
> > >>> registry
> > >>>>>> good enough?
> > >>>>>> 3. Shall we have a 1:1 mapping between an error code and exception
> > >>>>>> class? e.g. AirflowNotFoundException mapped only to AERR002 i.e.
> > only
> > >>> one
> > >>>>>> error code. (current implementation in PR has supports many to one
> > >>> mapping,
> > >>>>>> one exception class <-> multiple error codes based on respective
> > >>> context).
> > >>>>>> Look forward to your thoughts on above open questions or any other
> > >>>>>> design suggestions you'd like to add, thanks!
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> Omkar
> > >>>>>>
> > >>>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>

Re: [DISCUSS] Airflow error codes (AERR) design

Reply via email to