Hi,

Thanks for starting this discussion.

Overall, I am +0 on this. I have to agree somewhat with Ash here because I
am struggling to see how these AERR codes add enough value to justify the
additional complexity/process around them.

I can see some benefit from a observabiity/aggregation perspective
(timeouts, auth failures, transient network issues, etc.) but I not
convinced globally defined sequential error codes are the right solution
for this. Especially because most operational exceptions in Airflow come
from providers, not core. I agree with Jens here that this feature becomes
nearly useless if it does not cover providers.

I am +1 on Jarek's grouping idea but even then I am not really sure what
additional value it gives over just having better/descriptive error
messages.

One thing that stands out to me (which has already been hinted at by Ash)
is that providers have increasingly been moving away from wrapping
everything in AirflowException and instead preserving native SDK/runtime
exceptions and/or using provider-specific exceptions where it makes sense.
I am not really sure how that direction fits with a centralized AERR
registry (if you can clariffy this that would be great!). Otherwise we end
up in a situation where core exceptions have codes while provider
exceptions (which are the majority of real-world failures) do not.

To me, the strongest argument for structured IDs/categories would actually
be observability systems that need a strict filtering/aggregation field. I
can definitely see value there. But I think the LLM argument is weaker
since LLMs already work reasonably well with descriptive exception
messages/stack traces as-is.

The provider rollout also feels tricky to me. For example, if a user is
running the latest version of provider A (with this error system) but an
older version of provider B (without it), then same Airflow deployment
suddenly produces two different styles of errors depending on which
provider raised the exception. That inconsistency feels difficult to avoid
in an independently versioned provider ecosystem like Airflow’s (and I
don't believe implementing this in the common provider is enough to
mitigate this concern).

I think better error messages themselves would probably help users more
than error codes. Clear exceptions with actionable context are already
searchable and reasonably LLM-friendly without introducing another taxonomy
layer on top of Python exceptions.

Thanks,
Sameer Mesiah.

Reply via email to