I think you can check the runtime log to find some warn/error message in
master server and worker server when you received the hung up alarm.


Best Regards



---------------
Apache DolphinScheduler PMC Chair
LidongDai
[email protected]
Linkedin: https://www.linkedin.com/in/dailidong
Twitter: @WorkflowEasy <https://twitter.com/WorkflowEasy>
---------------


On Mon, Nov 22, 2021 at 10:54 AM 王峰 <[email protected]> wrote:

> 3 nodes, 2master/worker are all on the same machine, there is no downtime,
> but the server service has hung up the alarm. I guess that insufficient
> machine resources have affected the operation of the server, and fault
> tolerance has occurred. The actual task after the error identification is
> returned It did not stop, and a new task instance was started on the new
> server.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2021-11-21 18:41:49, "Lidong Dai" <[email protected]> wrote:
> >hi,
> >can you describe the question clearly? the host load means the Master
> >or the Worker server? is there any server down?
> >
> >Best Regards
> >
> >
> >
> >---------------
> >Apache DolphinScheduler PMC Chair
> >LidongDai
> >[email protected]
> >Linkedin: https://www.linkedin.com/in/dailidong
> >Twitter: @WorkflowEasy
> >---------------
> >
> >On Sun, Nov 21, 2021 at 3:59 PM 王峰 <[email protected]> wrote:
> >>
> >> doplhinscheduler 1.3.3 cluster
> >>
> >>
> >>
> >>
> >> There is such a scenario, because the host load is too high, master
> fault tolerance may occur in the middle, and the same workflow instance is
> run twice (two tasks are parallel in time), which causes the data to double.
>

Reply via email to