My take: 1. I see no problem with that 2. If we can find a good skill for it - yes - but I hardly see what skill that would be :) 3. Yes, the description could simply be simply part of the base exception, and we could generate documentation from it. 4. I have no opinion here - but it should be consistent.
J, On Tue, May 12, 2026 at 8:51 PM Omkar P <[email protected]> wrote: > Hi team, > > Thanks for your inputs. Design suggestions in this discussion so far: > > 1. Exceptions should have error code embedded inside them > 2. Error code to Exception is to be a 1:1 mapping > 3. Error code to be derived automatically from error class name e.g. > AirflowDagNotFoundBackfillException -> AERR-DAG-NOT-FOUND-BACKFILL, but > could be with some semantic grouping e.g. SQLAlchemy style > 4. YAML mapping to be removed as 1 to 3 above simplifies design > > I'll look into updating PR as per above points. Meanwhile, there are > some open questions, would be great if you could help think through: > > 1. Some exception types the user will never see e.g. DagCodeNotFound. > Should we have error codes for such exceptions as well? > 2. For SKILL file since we see a use case, shall we have an optional > skill for error codes, in case someone would like to use it? > 3. With YAML mapping removed, we need to decide whether to keep the > error description, first steps, docs link inside the exception class or > someplace else. Would keeping it inside the exception class have any > drawbacks? > 4. Is deriving error code directly from exception class name > (AirflowNotFoundException -> AERR-NOT-FOUND) the best way to group? Or > could we group in a more semantic way, like component-wise or similar? > e.g. AERR/SCHED/DAG/NOTFOUND essentially saying DAG was not found by > or in scheduler > > Regards, > Omkar > > On Mon, May 11, 2026 at 9:30 PM Jens Scheffler <[email protected]> > wrote: > > > Hi, > > > > +1 to Jarek and Ash, while I generally like the idea I#d favor _not_ > > needing a manuayl mapping in YAML and no code lookup table. > > > > Assuming for 95% of cases an 1:1 error code to exception mapping is > > reasonable. If there are 5% of cases then it might be pretty easy to > > split exceptions or adding a manualy code for these special cases. But > > all majority would be great if zero maintenance. Automated mapping from > > Exception class to error code seems reasonable. > > > > And for sure very very important would be to be able to support > > Providers in general. If this is only in core then it would be > > half-baked. Most exceptions in real life hopefully are generated in > > providers. > > > > Jens > > > > On 11.05.26 16:35, Jarek Potiuk wrote: > > > Ah ... and one good thing about the auto-mapping idea. You know that > > > saying: T*he world is a slightly better place with every single line of > > > yaml removed or not even created in the first place. *This is almost > > > literally the quote from our "Monorepo" talk with Amogh in Talk Python > To > > > Me :). > > > > > > On Mon, May 11, 2026 at 4:33 PM Jarek Potiuk <[email protected]> wrote: > > > > > >> 1. Agree with the grouping idea. I think even originally when you > > >> discussed it Omkar - there were some "groups" of exceptions. > > >> AERR-DAG-NOTFOUND-BACKFILL seems like a more suitable short name than > > 0001, > > >> provided it is descriptive enough for you to easily understand what > each > > >> error means. I would hate always having to look up the error code in a > > >> table or YAML file. We coud have such table generated and in docs, but > > >> essentially after seeing enough logs you should know what the short > code > > >> means without memorizing the number. It's almost inhuman to force > > people to > > >> associate numeric values with meaning. > > >> > > >> 2. I think 1-1 mapping exception to the code would be good. While a > > short > > >> error code is useful in logs, seeing the short name in the code when > you > > >> "raise" them is counterproductive because it adds noise to something > we > > >> already have: the Exception Class name. On the other hand, such a > class > > >> name looks way worse in the logs./ > > >> > > >> 3. *Idea:* Why don't we just keep the correct naming convention for > our > > >> Exceptions and map them into IDs automatically (e.g., > > >> AirflowDagNotFoundBackfillException -> AERR-DAG-NOT-FOUND-BACKFILL). I > > >> think it ticks all the boxes: > > >> > > >> * 0 maintenance (just a hook to check if all exceptions follow the > right > > >> conventions > > >> * 0 mapping > > >> * Code friendly > > >> * Log friendly > > >> * You see what you get by looking at either the exception class or ID > > >> * We can build an exception hierarchy that allows us to catch several > > >> exceptions (e.g., `AirflowDagNotFoundException` being an abstract > > >> (non-instantiable) parent of AirflowDagNotFoundBackfillExceptions and > > >> AirflowDagNotFoundParsingException for example > > >> * Grouping works naturally and without conscious thought—in both > > exception > > >> classes and IDs > > >> > > >> Essentially, no SKILL is needed for that. > > >> > > >> And BTW. I think none of our "coding" should really "Requiire" using > > >> SKILLS and "impair" those who do not use agents. Even though I'm known > > as > > >> an AI and Agent enthusiast, we should avoid making standard code parts > > or > > >> development workflows inaccessible to those who don't want to use > > agents, > > >> especially if it's easy. > > >> > > >> It's one thing to empower maintainers and contributors with SKILLS to > > >> review or triage PRs if they want to or for someone doing translation > to > > >> add a new phrase in a language. However, it's a different story when > > >> discussing basic "code" tasks, like adding new exceptions. Ideally, > > those > > >> tasks should not **require** you to use Agents or be "difficult" > without > > >> them. We should totally respect people who choose not to use agents > > >> themselves and ensure they do not feel like "lesser" people. Promoting > > >> something and giving people new tools is one thing; making it a > > mandatory > > >> part of the regular workflow when it isn't truly required is another. > > >> > > >> J. > > >> > > >> > > >> > > >> On Mon, May 11, 2026 at 3:30 PM Ash Berlin-Taylor <[email protected]> > > wrote: > > >> > > >>> Maybe we should not have sequential IDs at all and do something > similar > > >>> to what SQLA does: https://sqlalche.me/e/20/xd2s for example (That’s > > >>> `/e/<major><minor>/<code>` which redirects) > > >>> > > >>> Some of the example(?) errors are internal to a single component and > > >>> never exposed to users, so shouldn’t be in the registry - > > >>> AERR009/DagCodeNotFound for instance, is likely thrown by the ORM > > layer and > > >>> caught by the API server, which is to say it is entirely invisible to > > the > > >>> user? I imagine there are many more in this category. > > >>> > > >>> > > >>> AERR010 and AERR011 are both DagNotFound, but 11 is specifically for > > >>> "Requested DAG could not be found for backfill operation” — that > seems > > very > > >>> odd to have a different error code for that. > > >>> > > >>> We also have provider specific error codes in the main registry which > > >>> isn’t a pattern that will work (`user_facing_error_message: Google > Ads > > link > > >>> not found for the specified property`) etc. > > >>> > > >>> -ash > > >>> > > >>> > > >>>> On 11 May 2026, at 14:20, Ash Berlin-Taylor <[email protected]> wrote: > > >>>> > > >>>> If we do this (and I’m still not sure what I think overall) +1 to > some > > >>> kind of grouping. Right now for instance the registry has AERR002 for > > >>> connection not found, but no space to add Variable not found, or > > State not > > >>> found in the future. > > >>>>> On 11 May 2026, at 12:25, Dev-iL <[email protected]> wrote: > > >>>>> > > >>>>> (please assume there's a "In my opinion, " prefix to every > sentence) > > >>>>> > > >>>>> 0. Since the dev workflow is very structured, it can/should be made > > >>> into a > > >>>>> SKILL. > > >>>>> 1. Long term yes, but while we refactor the existing code we should > > >>> allow > > >>>>> it (assuming it trip hooks or CI) > > >>>>> 2. YAML seems suitable at first glance > > >>>>> 3. One code per exception makes sense to me. Depending on how we > want > > >>> the > > >>>>> exception taxonomy to evolve, perhaps we want to have codes like > > >>> ###.### > > >>>>> for "parent" and "subclass" exceptions, or Ruff-style #00 will be a > > >>> family > > >>>>> of similar exceptions. > > >>>>> > > >>>>> > > >>>>> On Mon, 11 May 2026, 12:15 Omkar P, <[email protected]> > wrote: > > >>>>> > > >>>>>> Hi team, > > >>>>>> > > >>>>>> Starting this thread to discuss the design of Airflow error codes. > > >>> These > > >>>>>> are LLM-friendly strings starting with AERR, which airflow devs > can > > >>> use > > >>>>>> when raising exceptions, to convey the error context to dag users > > in a > > >>>>>> succinct way. Providing current design details below. > > >>>>>> > > >>>>>> PR: https://github.com/apache/airflow/pull/65423 > > >>>>>> > > >>>>>> Feature flow: > > >>>>>> 1. airflow dev identifies error case and defines a new error code > in > > >>> the > > >>>>>> error mapping yaml (say AERR002). > > >>>>>> 2. dev then adds AirflowErrorCodeMixin to respective exception > class > > >>>>>> that they'd want to raise with an error_code. > > >>>>>> 3. dev then specifies the error_code in raise in code (e.g. raise > > >>>>>> AirflowNotFoundException(..., error_code="AERR002")). > > >>>>>> 4. dev runs breeze build-docs that generates a new docs page > > >>> AERR002.rst > > >>>>>> 5. breeze static check takes care of validating if error code is > > >>> mapped > > >>>>>> to correct exception class. > > >>>>>> > > >>>>>> User side: > > >>>>>> On airflow users' side, they now see airflow error code as > > >>>>>> part of the stack trace, which they can use for communicating > > problems > > >>>>>> instead of pasting verbose stack traces. Error codes also improve > > >>>>>> LLM-based discovery of airflow errors as codes are much more > > >>>>>> deterministic/well-defined than plain stack traces. > > >>>>>> > > >>>>>> Open questions: > > >>>>>> 1. Should the error code be mandatory for all raises of an > exception > > >>>>>> class that uses them? > > >>>>>> 2. Where should the error code info be stored? Is a YAML-based > > >>> registry > > >>>>>> good enough? > > >>>>>> 3. Shall we have a 1:1 mapping between an error code and exception > > >>>>>> class? e.g. AirflowNotFoundException mapped only to AERR002 i.e. > > only > > >>> one > > >>>>>> error code. (current implementation in PR has supports many to one > > >>> mapping, > > >>>>>> one exception class <-> multiple error codes based on respective > > >>> context). > > >>>>>> Look forward to your thoughts on above open questions or any other > > >>>>>> design suggestions you'd like to add, thanks! > > >>>>>> > > >>>>>> Regards, > > >>>>>> Omkar > > >>>>>> > > >>> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > >
