Ah ... and one good thing about the auto-mapping idea. You know that saying: T*he world is a slightly better place with every single line of yaml removed or not even created in the first place. *This is almost literally the quote from our "Monorepo" talk with Amogh in Talk Python To Me :).
On Mon, May 11, 2026 at 4:33 PM Jarek Potiuk <[email protected]> wrote: > 1. Agree with the grouping idea. I think even originally when you > discussed it Omkar - there were some "groups" of exceptions. > AERR-DAG-NOTFOUND-BACKFILL seems like a more suitable short name than 0001, > provided it is descriptive enough for you to easily understand what each > error means. I would hate always having to look up the error code in a > table or YAML file. We coud have such table generated and in docs, but > essentially after seeing enough logs you should know what the short code > means without memorizing the number. It's almost inhuman to force people to > associate numeric values with meaning. > > 2. I think 1-1 mapping exception to the code would be good. While a short > error code is useful in logs, seeing the short name in the code when you > "raise" them is counterproductive because it adds noise to something we > already have: the Exception Class name. On the other hand, such a class > name looks way worse in the logs./ > > 3. *Idea:* Why don't we just keep the correct naming convention for our > Exceptions and map them into IDs automatically (e.g., > AirflowDagNotFoundBackfillException -> AERR-DAG-NOT-FOUND-BACKFILL). I > think it ticks all the boxes: > > * 0 maintenance (just a hook to check if all exceptions follow the right > conventions > * 0 mapping > * Code friendly > * Log friendly > * You see what you get by looking at either the exception class or ID > * We can build an exception hierarchy that allows us to catch several > exceptions (e.g., `AirflowDagNotFoundException` being an abstract > (non-instantiable) parent of AirflowDagNotFoundBackfillExceptions and > AirflowDagNotFoundParsingException for example > * Grouping works naturally and without conscious thought—in both exception > classes and IDs > > Essentially, no SKILL is needed for that. > > And BTW. I think none of our "coding" should really "Requiire" using > SKILLS and "impair" those who do not use agents. Even though I'm known as > an AI and Agent enthusiast, we should avoid making standard code parts or > development workflows inaccessible to those who don't want to use agents, > especially if it's easy. > > It's one thing to empower maintainers and contributors with SKILLS to > review or triage PRs if they want to or for someone doing translation to > add a new phrase in a language. However, it's a different story when > discussing basic "code" tasks, like adding new exceptions. Ideally, those > tasks should not **require** you to use Agents or be "difficult" without > them. We should totally respect people who choose not to use agents > themselves and ensure they do not feel like "lesser" people. Promoting > something and giving people new tools is one thing; making it a mandatory > part of the regular workflow when it isn't truly required is another. > > J. > > > > On Mon, May 11, 2026 at 3:30 PM Ash Berlin-Taylor <[email protected]> wrote: > >> Maybe we should not have sequential IDs at all and do something similar >> to what SQLA does: https://sqlalche.me/e/20/xd2s for example (That’s >> `/e/<major><minor>/<code>` which redirects) >> >> Some of the example(?) errors are internal to a single component and >> never exposed to users, so shouldn’t be in the registry - >> AERR009/DagCodeNotFound for instance, is likely thrown by the ORM layer and >> caught by the API server, which is to say it is entirely invisible to the >> user? I imagine there are many more in this category. >> >> >> AERR010 and AERR011 are both DagNotFound, but 11 is specifically for >> "Requested DAG could not be found for backfill operation” — that seems very >> odd to have a different error code for that. >> >> We also have provider specific error codes in the main registry which >> isn’t a pattern that will work (`user_facing_error_message: Google Ads link >> not found for the specified property`) etc. >> >> -ash >> >> >> > On 11 May 2026, at 14:20, Ash Berlin-Taylor <[email protected]> wrote: >> > >> > If we do this (and I’m still not sure what I think overall) +1 to some >> kind of grouping. Right now for instance the registry has AERR002 for >> connection not found, but no space to add Variable not found, or State not >> found in the future. >> > >> >> On 11 May 2026, at 12:25, Dev-iL <[email protected]> wrote: >> >> >> >> (please assume there's a "In my opinion, " prefix to every sentence) >> >> >> >> 0. Since the dev workflow is very structured, it can/should be made >> into a >> >> SKILL. >> >> 1. Long term yes, but while we refactor the existing code we should >> allow >> >> it (assuming it trip hooks or CI) >> >> 2. YAML seems suitable at first glance >> >> 3. One code per exception makes sense to me. Depending on how we want >> the >> >> exception taxonomy to evolve, perhaps we want to have codes like >> ###.### >> >> for "parent" and "subclass" exceptions, or Ruff-style #00 will be a >> family >> >> of similar exceptions. >> >> >> >> >> >> On Mon, 11 May 2026, 12:15 Omkar P, <[email protected]> wrote: >> >> >> >>> Hi team, >> >>> >> >>> Starting this thread to discuss the design of Airflow error codes. >> These >> >>> are LLM-friendly strings starting with AERR, which airflow devs can >> use >> >>> when raising exceptions, to convey the error context to dag users in a >> >>> succinct way. Providing current design details below. >> >>> >> >>> PR: https://github.com/apache/airflow/pull/65423 >> >>> >> >>> Feature flow: >> >>> 1. airflow dev identifies error case and defines a new error code in >> the >> >>> error mapping yaml (say AERR002). >> >>> 2. dev then adds AirflowErrorCodeMixin to respective exception class >> >>> that they'd want to raise with an error_code. >> >>> 3. dev then specifies the error_code in raise in code (e.g. raise >> >>> AirflowNotFoundException(..., error_code="AERR002")). >> >>> 4. dev runs breeze build-docs that generates a new docs page >> AERR002.rst >> >>> 5. breeze static check takes care of validating if error code is >> mapped >> >>> to correct exception class. >> >>> >> >>> User side: >> >>> On airflow users' side, they now see airflow error code as >> >>> part of the stack trace, which they can use for communicating problems >> >>> instead of pasting verbose stack traces. Error codes also improve >> >>> LLM-based discovery of airflow errors as codes are much more >> >>> deterministic/well-defined than plain stack traces. >> >>> >> >>> Open questions: >> >>> 1. Should the error code be mandatory for all raises of an exception >> >>> class that uses them? >> >>> 2. Where should the error code info be stored? Is a YAML-based >> registry >> >>> good enough? >> >>> 3. Shall we have a 1:1 mapping between an error code and exception >> >>> class? e.g. AirflowNotFoundException mapped only to AERR002 i.e. only >> one >> >>> error code. (current implementation in PR has supports many to one >> mapping, >> >>> one exception class <-> multiple error codes based on respective >> context). >> >>> >> >>> Look forward to your thoughts on above open questions or any other >> >>> design suggestions you'd like to add, thanks! >> >>> >> >>> Regards, >> >>> Omkar >> >>> >> > >> >>
