Hi, Thanks for starting this discussion.
Overall, I am +0 on this. I have to agree somewhat with Ash here because I am struggling to see how these AERR codes add enough value to justify the additional complexity/process around them. I can see some benefit from a observabiity/aggregation perspective (timeouts, auth failures, transient network issues, etc.) but I not convinced globally defined sequential error codes are the right solution for this. Especially because most operational exceptions in Airflow come from providers, not core. I agree with Jens here that this feature becomes nearly useless if it does not cover providers. I am +1 on Jarek's grouping idea but even then I am not really sure what additional value it gives over just having better/descriptive error messages. One thing that stands out to me (which has already been hinted at by Ash) is that providers have increasingly been moving away from wrapping everything in AirflowException and instead preserving native SDK/runtime exceptions and/or using provider-specific exceptions where it makes sense. I am not really sure how that direction fits with a centralized AERR registry (if you can clariffy this that would be great!). Otherwise we end up in a situation where core exceptions have codes while provider exceptions (which are the majority of real-world failures) do not. To me, the strongest argument for structured IDs/categories would actually be observability systems that need a strict filtering/aggregation field. I can definitely see value there. But I think the LLM argument is weaker since LLMs already work reasonably well with descriptive exception messages/stack traces as-is. The provider rollout also feels tricky to me. For example, if a user is running the latest version of provider A (with this error system) but an older version of provider B (without it), then same Airflow deployment suddenly produces two different styles of errors depending on which provider raised the exception. That inconsistency feels difficult to avoid in an independently versioned provider ecosystem like Airflow’s (and I don't believe implementing this in the common provider is enough to mitigate this concern). I think better error messages themselves would probably help users more than error codes. Clear exceptions with actionable context are already searchable and reasonably LLM-friendly without introducing another taxonomy layer on top of Python exceptions. Thanks, Sameer Mesiah.
