+1 for ExitOnOutOfMemoryError.
+HeapDumpOnOutOfMemroyError may produce very large files. How to clean up
these old files? WDYT?

Thanks
ZhangJian He


On Tue, 14 Nov 2023 at 12:04, Gaofei Cao <[email protected]> wrote:

> +1,
>
>  `-XX:+ExitOnOutOfMemoryError` parameter can avoid the loss of some
> key threads, it will be beneficial to the system.
> If the IoTDB cluster is deployed on k8s, this parameter is more
> indispensable, because k8s can dispatch another pod to replace this
> OOM node rapidly.
> Besides, i think we can add the usage of `-XX:+ExitOnOutOfMemoryError`
> and `-XX:+HeapDumpOnOutOfMemoryError` in the user/DBA manual, which is
> important to find the root cause of OOM.
>
> Best,
> ----------------------
> Gaofei Cao
>
> Yuan Tian <[email protected]> 于2023年11月13日周一 19:52写道:
> >
> > Hi all,
> >
> > Recently, we found in some real user cases that when OOM occurs in the
> > DataNode process (although we should ensure that OOM does not happen, but
> > we all know that bugs will always exist), some threads(e.g. rpc listening
> > threads) may exit unexpectedly which may cause some strange things to
> > happen. For example, if the heartbeat listening thread on the DataNode
> > unexpectedly exits due to OOM, and then the OOM recovers on its own (some
> > large queries end, or some compaction tasks end), but this thread will
> > never exist again, causing the DataNode to remain in unknown state,
> because
> > the ConfigNode can no longer contact it via heartbeat.
> >
> > Therefore, we feel that OOM is a high-risk error, and we should let the
> > process exit directly to avoid the loss of some key threads.
> >
> > And I did an experiment and found that -XX:+ExitOnOutOfMemoryError and
> > -XX:+HeapDumpOnOutOfMemoryError do not conflict which means that we can
> > keep both in jvm args and when OOM happens, it will firstly dump the heap
> > memory and then exit.
> >
> > I've made this change in my pr(
> https://github.com/apache/iotdb/pull/11531).
> >
> > What do you think?
> >
> >
> >
> >
> > Best,
> > ----------------------
> > Yuan Tian
>

Reply via email to