potiuk commented on PR #39543:
URL: https://github.com/apache/airflow/pull/39543#issuecomment-2156783300
> @potiuk I thought about this a little more. I think we should keep a note
about OOMKill for -9 (and maybe add a blurb about other things that could cause
-9 like you suggest), but we should replace log something different for when
the return code is None. In this case, we should simply indicate that the task
was killed for some unknown reason. I think that just assuming -9 is
misleading, and causes more confusion. What do you think?
I think anything where we have a space (in our docs) where we can direct
user (via link) where they look for a problem is good. Even if it is incomplete
but says "those can be the reasons by there are more" is way better than
anything that gives the user no clue whatsoever. We can add more stuff there
over time even if initial assesment is not complete, every single time when we
discuss with user and find another reason we can update that documentation and
make it better. If another committer looks at it and they have no clue, they
can also learn from that information - that's why it should provide context and
where the error might be generated from. I think just providing the log with
explaining WHAT happened without telling context WHY it happend and HOW they
can fix the problem will inevitably lead to the users asking on the issues or
discussions what to do. And our goal should be:
a) either they find a possible cause by following the docs
b) or when they post issue or discussion, other users will follow the docs
and advise them
c) a manintainer finds the root cause by investigation and not only tell it
to the user but also update the documentation to include that reason
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]