Hi team,

Thanks for your inputs. Design suggestions in this discussion so far:

1. Exceptions should have error code embedded inside them
2. Error code to Exception is to be a 1:1 mapping
3. Error code to be derived automatically from error class name e.g.
AirflowDagNotFoundBackfillException -> AERR-DAG-NOT-FOUND-BACKFILL, but
could be with some semantic grouping e.g. SQLAlchemy style
4. YAML mapping to be removed as 1 to 3 above simplifies design

I'll look into updating PR as per above points. Meanwhile, there are
some open questions, would be great if you could help think through:

1. Some exception types the user will never see e.g. DagCodeNotFound.
Should we have error codes for such exceptions as well?
2. For SKILL file since we see a use case, shall we have an optional
skill for error codes, in case someone would like to use it?
3. With YAML mapping removed, we need to decide whether to keep the
error description, first steps, docs link inside the exception class or
someplace else. Would keeping it inside the exception class have any
drawbacks?
4. Is deriving error code directly from exception class name
(AirflowNotFoundException -> AERR-NOT-FOUND) the best way to group? Or
could we group in a more semantic way, like component-wise or similar?
e.g. AERR/SCHED/DAG/NOTFOUND essentially saying DAG was not found by
or in scheduler

Regards,
Omkar

On Mon, May 11, 2026 at 9:30 PM Jens Scheffler <[email protected]> wrote:

> Hi,
>
> +1 to Jarek and Ash, while I generally like the idea I#d favor _not_
> needing a manuayl mapping in YAML and no code lookup table.
>
> Assuming for 95% of cases an 1:1 error code to exception mapping is
> reasonable. If there are 5% of cases then it might be pretty easy to
> split exceptions or adding a manualy code for these special cases. But
> all majority would be great if zero maintenance. Automated mapping from
> Exception class to error code seems reasonable.
>
> And for sure very very important would be to be able to support
> Providers in general. If this is only in core then it would be
> half-baked. Most exceptions in real life hopefully are generated in
> providers.
>
> Jens
>
> On 11.05.26 16:35, Jarek Potiuk wrote:
> > Ah ... and one good thing about the auto-mapping idea. You know that
> > saying: T*he world is a slightly better place with every single line of
> > yaml removed or not even created in the first place. *This is almost
> > literally the quote from our "Monorepo" talk with Amogh in Talk Python To
> > Me :).
> >
> > On Mon, May 11, 2026 at 4:33 PM Jarek Potiuk <[email protected]> wrote:
> >
> >> 1. Agree with the grouping idea. I think even originally when you
> >> discussed it Omkar - there were some "groups" of exceptions.
> >> AERR-DAG-NOTFOUND-BACKFILL seems like a more suitable short name than
> 0001,
> >> provided it is descriptive enough for you to easily understand what each
> >> error means. I would hate always having to look up the error code in a
> >> table or YAML file. We coud have such table generated and in docs, but
> >> essentially after seeing enough logs you should know what the short code
> >> means without memorizing the number. It's almost inhuman to force
> people to
> >> associate numeric values with meaning.
> >>
> >> 2. I think 1-1 mapping exception to the code would be good. While a
> short
> >> error code is useful in logs, seeing the short name in the code when you
> >> "raise" them is counterproductive because it adds noise to something we
> >> already have: the Exception Class name. On the other hand, such a class
> >> name looks way worse in the logs./
> >>
> >> 3. *Idea:* Why don't we just keep the correct naming convention for our
> >> Exceptions and map them into IDs automatically (e.g.,
> >> AirflowDagNotFoundBackfillException -> AERR-DAG-NOT-FOUND-BACKFILL). I
> >> think it ticks all the boxes:
> >>
> >> * 0 maintenance (just a hook to check if all exceptions follow the right
> >> conventions
> >> * 0 mapping
> >> * Code friendly
> >> * Log friendly
> >> * You see what you get by looking at either the exception class or ID
> >> * We can build an exception hierarchy that allows us to catch several
> >> exceptions (e.g., `AirflowDagNotFoundException` being an abstract
> >> (non-instantiable) parent of AirflowDagNotFoundBackfillExceptions and
> >> AirflowDagNotFoundParsingException for example
> >> * Grouping works naturally and without conscious thought—in both
> exception
> >> classes and IDs
> >>
> >> Essentially, no SKILL is needed for that.
> >>
> >> And BTW. I think none of our "coding" should really "Requiire" using
> >> SKILLS and "impair" those who do not use agents. Even though I'm known
> as
> >> an AI and Agent enthusiast, we should avoid making standard code parts
> or
> >> development workflows inaccessible to those who don't want to use
> agents,
> >> especially if it's easy.
> >>
> >> It's one thing to empower maintainers and contributors with SKILLS to
> >> review or triage PRs if they want to or for someone doing translation to
> >> add a new phrase in a language. However, it's a different story when
> >> discussing basic "code" tasks, like adding new exceptions. Ideally,
> those
> >> tasks should not **require** you to use Agents or be "difficult" without
> >> them. We should totally respect people who choose not to use agents
> >> themselves and ensure they do not feel like "lesser" people. Promoting
> >> something and giving people new tools is one thing; making it a
> mandatory
> >> part of the regular workflow when it isn't truly required is another.
> >>
> >> J.
> >>
> >>
> >>
> >> On Mon, May 11, 2026 at 3:30 PM Ash Berlin-Taylor <[email protected]>
> wrote:
> >>
> >>> Maybe we should not have sequential IDs at all and do something similar
> >>> to what SQLA does: https://sqlalche.me/e/20/xd2s for example (That’s
> >>> `/e/<major><minor>/<code>` which redirects)
> >>>
> >>> Some of the example(?) errors are internal to a single component and
> >>> never exposed to users, so shouldn’t be in the registry -
> >>> AERR009/DagCodeNotFound for instance, is likely thrown by the ORM
> layer and
> >>> caught by the API server, which is to say it is entirely invisible to
> the
> >>> user? I imagine there are many more in this category.
> >>>
> >>>
> >>> AERR010 and AERR011 are both DagNotFound, but 11 is specifically for
> >>> "Requested DAG could not be found for backfill operation” — that seems
> very
> >>> odd to have a different error code for that.
> >>>
> >>> We also have provider specific error codes in the main registry which
> >>> isn’t a pattern that will work (`user_facing_error_message: Google Ads
> link
> >>> not found for the specified property`) etc.
> >>>
> >>> -ash
> >>>
> >>>
> >>>> On 11 May 2026, at 14:20, Ash Berlin-Taylor <[email protected]> wrote:
> >>>>
> >>>> If we do this (and I’m still not sure what I think overall) +1 to some
> >>> kind of grouping. Right now for instance the registry has AERR002 for
> >>> connection not found, but no space to add  Variable not found, or
> State not
> >>> found in the future.
> >>>>> On 11 May 2026, at 12:25, Dev-iL <[email protected]> wrote:
> >>>>>
> >>>>> (please assume there's a "In my opinion, " prefix to every sentence)
> >>>>>
> >>>>> 0. Since the dev workflow is very structured, it can/should be made
> >>> into a
> >>>>> SKILL.
> >>>>> 1. Long term yes, but while we refactor the existing code we should
> >>> allow
> >>>>> it (assuming it trip hooks or CI)
> >>>>> 2. YAML seems suitable at first glance
> >>>>> 3. One code per exception makes sense to me. Depending on how we want
> >>> the
> >>>>> exception taxonomy to evolve, perhaps we want to have codes like
> >>> ###.###
> >>>>> for "parent" and "subclass" exceptions, or Ruff-style #00 will be a
> >>> family
> >>>>> of similar exceptions.
> >>>>>
> >>>>>
> >>>>> On Mon, 11 May 2026, 12:15 Omkar P, <[email protected]> wrote:
> >>>>>
> >>>>>> Hi team,
> >>>>>>
> >>>>>> Starting this thread to discuss the design of Airflow error codes.
> >>> These
> >>>>>> are LLM-friendly strings starting with AERR, which airflow devs can
> >>> use
> >>>>>> when raising exceptions, to convey the error context to dag users
> in a
> >>>>>> succinct way. Providing current design details below.
> >>>>>>
> >>>>>> PR: https://github.com/apache/airflow/pull/65423
> >>>>>>
> >>>>>> Feature flow:
> >>>>>> 1. airflow dev identifies error case and defines a new error code in
> >>> the
> >>>>>> error mapping yaml (say AERR002).
> >>>>>> 2. dev then adds AirflowErrorCodeMixin to respective exception class
> >>>>>> that they'd want to raise with an error_code.
> >>>>>> 3. dev then specifies the error_code in raise in code (e.g.  raise
> >>>>>> AirflowNotFoundException(..., error_code="AERR002")).
> >>>>>> 4. dev runs breeze build-docs that generates a new docs page
> >>> AERR002.rst
> >>>>>> 5. breeze static check takes care of validating if error code is
> >>> mapped
> >>>>>> to correct exception class.
> >>>>>>
> >>>>>> User side:
> >>>>>> On airflow users' side, they now see airflow error code as
> >>>>>> part of the stack trace, which they can use for communicating
> problems
> >>>>>> instead of pasting verbose stack traces. Error codes also improve
> >>>>>> LLM-based discovery of airflow errors as codes are much more
> >>>>>> deterministic/well-defined than plain stack traces.
> >>>>>>
> >>>>>> Open questions:
> >>>>>> 1. Should the error code be mandatory for all raises of an exception
> >>>>>> class that uses them?
> >>>>>> 2. Where should the error code info be stored? Is a YAML-based
> >>> registry
> >>>>>> good enough?
> >>>>>> 3. Shall we have a 1:1 mapping between an error code and exception
> >>>>>> class? e.g. AirflowNotFoundException mapped only to AERR002 i.e.
> only
> >>> one
> >>>>>> error code. (current implementation in PR has supports many to one
> >>> mapping,
> >>>>>> one exception class <-> multiple error codes based on respective
> >>> context).
> >>>>>> Look forward to your thoughts on above open questions or any other
> >>>>>> design suggestions you'd like to add, thanks!
> >>>>>>
> >>>>>> Regards,
> >>>>>> Omkar
> >>>>>>
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to