omkar-foss commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2444113421

   Have a suggestion for multi-possible-root-cause issues - we can print 
Airflow error code with the error message e.g. `AERR055: Job 10 was killed 
before it finished` and can have an error code mapping with possible root 
causes like (just examples, not real causes):
   
   | Error Code | Possible Commonly Observed Causes                       |
   |------------|---------------------------------------------------------|
   |  AERR055   | 1) Ran out of memory                                    |
   |            | 2) Job was stuck and killed after timeout               |
   |            | 3) Job being run on Spot Instance Node (K8S on EKS)     |
   
   Since error codes are shareable and easily searchable, it would be useful 
for team collaboration as well (e.g. instead of me saying "I'm looking into the 
error `Job 10 was killed before it finished`", can probably just say "I'm 
looking into AERR055". Much like how we use JIRA ticket numbers or GitHub 
issue/PR numbers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to