Hey Zhao,

Yes, this is expected behavior. SIGKILL'ing NMs will result in all
processes being leaked. This is really an issue with the way Linux handles
orphaned child processes. It currently just changes the PPID to 1, and
allows the process to continue executing. I did some brief exploration of
this here:

  
http://riccomini.name/posts/linux/2012-09-25-kill-subprocesses-linux-bash/

At LinkedIn, we do several things:

1. Soft kill (SIGTERM) the NMs, to allow the NMs to properly shutdown all
containers.
2. Before deploying an NM, we verify that there are no existing processes
with "container_*" running with a PPID of 1.

You could also verify that *all* container_* processes are dead after
SIGKILL'ing the NM, if you want to make extra sure that you haven't leaked
containers (which could lead to a double-writing messages in Samza).

In practice, once we implemented (1), above, we haven't seen any leaked
containers. In a case where an NM dies unexpectedly (e.g. the JVM
segfaults, or something) you have to go and clean the leaked processes.

Cheers,
Chris

On 1/16/15 5:20 AM, "Zhao Weinan" <[email protected]> wrote:

>Hi,
>
>We are running some samza task on hadoop yarn 2.4.1. And for some reason,
>we restart the whole cluster by SIGKILL RMs and NMs, with samza task left.
>Then we found that samza task preserved through the SIGKILL and restart,
>which made us trouble to locate task process over clusters. It's that
>expected?
>
>Thanks!

Reply via email to