Re: [DISCUSS] Airflow error codes (AERR) design

Omkar P Wed, 20 May 2026 00:45:07 -0700

Hi, updated doc string example (with proper formatting) here:
https://github.com/apache/airflow/pull/65423#issuecomment-4495832898


Regards,
Omkar

On Wed, May 20, 2026 at 7:11 AM Omkar P <[email protected]> wrote:

> Hi team,
>
> Thanks for deliberating on this and sorry for the delayed reply.
>
> I agree with Ash, error codes seem pointless if we're going with
> narrower exception classes.
>
> In my opinion, for Airflow additional value can ONLY be obtained if
> error codes are passed as per root cause or contextwise e.g. raise
> AirflowNotFoundException(error_code =
> "AERR-SECRET-NOT-FOUND") conveys that its a not found error, but in
> context its a **secrets** not found problem.
>
> But in this thread inclination has been more towards 1:1 error code to
> exception mapping, rightly, to avoid complexity. But that does also make
> error codes redundant, and side benefit of llm token use reduction (NOT
> to be misunderstood as general better llm performance) is simply not
> enough reason for additional complexity.
>
> So yes, unless we're talking about using error codes contextwise, I
> suppose it'll best to leave it at this and focus our energy on main reason
> why we even started this discussion - better error messages (as Jarek
> and Sameer rightly mentioned).
>
> **Better** as in there should be comprehensive docs (why it occurred,
> what should user do) for every exception. And not just "Raised when this
> happened...". This particularly is needed for providers where
> exceptions are more likely to be generated (as Jens mentioned).
> Continuing what we already discussed above in this thread about
> embedding error info in exception class (
> https://github.com/apache/airflow/pull/65423/files#diff-743dc549559a0d0682597ce4d917ac237c145af5a53bbb386aaf1cae24adb1eeR57-R82
> ).
>
> We could simply have error meta in exc doc string -
>
> class AirflowNotFoundException(AirflowException):
>
> """Raise when the requested object/resource is not available..."""
>
>   user_facing_error_message = "Requested resource was not found"
>   description = "This error occurs when Airflow is unable to locate..."
>   first_steps = "Verify that the requested..."
>   documentation = "https://airflow.apache.org/docs/...";
>
> No error mixin or additional properties, just plain doc string used to
> generate doc page for exception classes. Breaking changes won't be
> required and support for providers will come out of the box with doc
> string.
>
> Let me know if any thoughts or concerns on this approach, thanks.
> Regards,
>
> Omkar
>
> On Mon, May 18, 2026 at 9:26 PM Jarek Potiuk <[email protected]> wrote:
>
>> > I am +1 on Jarek's grouping idea but even then I am not really sure what
>> additional value it gives over just having better/descriptive error
>> messages.
>>
>> > I think better error messages themselves would probably help users more
>> than error codes. Clear exceptions with actionable context are already
>> searchable and reasonably LLM-friendly without introducing another
>> taxonomy
>> layer on top of Python exceptions.
>>
>> Changing my vote to -0.5. I thought about it a lot, and this is the most
>> valid point actually. Possibly what we **really** want is to ensure our
>> error descriptions are good. I also discussed it today with a friend. The
>> point of the discussion was that the documentation should be easy for
>> agents to read. Surprisingly, unique error codes, are quite a bit better
>> for humans, than agents - agents will find their way in even slightly
>> chaotic text as long as the description of the error is good and somewhat
>> actionable.
>>
>> J
>> ><
>>
>>
>> On Mon, May 18, 2026 at 10:21 PM Sameer Mesiah <[email protected]>
>> wrote:
>>
>> > Hi,
>> >
>> > Thanks for starting this discussion.
>> >
>> > Overall, I am +0 on this. I have to agree somewhat with Ash here
>> because I
>> > am struggling to see how these AERR codes add enough value to justify
>> the
>> > additional complexity/process around them.
>> >
>> > I can see some benefit from a observabiity/aggregation perspective
>> > (timeouts, auth failures, transient network issues, etc.) but I not
>> > convinced globally defined sequential error codes are the right solution
>> > for this. Especially because most operational exceptions in Airflow come
>> > from providers, not core. I agree with Jens here that this feature
>> becomes
>> > nearly useless if it does not cover providers.
>> >
>> > I am +1 on Jarek's grouping idea but even then I am not really sure what
>> > additional value it gives over just having better/descriptive error
>> > messages.
>> >
>> > One thing that stands out to me (which has already been hinted at by
>> Ash)
>> > is that providers have increasingly been moving away from wrapping
>> > everything in AirflowException and instead preserving native SDK/runtime
>> > exceptions and/or using provider-specific exceptions where it makes
>> sense.
>> > I am not really sure how that direction fits with a centralized AERR
>> > registry (if you can clariffy this that would be great!). Otherwise we
>> end
>> > up in a situation where core exceptions have codes while provider
>> > exceptions (which are the majority of real-world failures) do not.
>> >
>> > To me, the strongest argument for structured IDs/categories would
>> actually
>> > be observability systems that need a strict filtering/aggregation
>> field. I
>> > can definitely see value there. But I think the LLM argument is weaker
>> > since LLMs already work reasonably well with descriptive exception
>> > messages/stack traces as-is.
>> >
>> > The provider rollout also feels tricky to me. For example, if a user is
>> > running the latest version of provider A (with this error system) but an
>> > older version of provider B (without it), then same Airflow deployment
>> > suddenly produces two different styles of errors depending on which
>> > provider raised the exception. That inconsistency feels difficult to
>> avoid
>> > in an independently versioned provider ecosystem like Airflow’s (and I
>> > don't believe implementing this in the common provider is enough to
>> > mitigate this concern).
>> >
>> > I think better error messages themselves would probably help users more
>> > than error codes. Clear exceptions with actionable context are already
>> > searchable and reasonably LLM-friendly without introducing another
>> taxonomy
>> > layer on top of Python exceptions.
>> >
>> > Thanks,
>> > Sameer Mesiah.
>> >
>>
>

Re: [DISCUSS] Airflow error codes (AERR) design

Reply via email to