On this problem - the underlying issue is that the process became a Zombie and wasn't cleaned up. As best as can be seen the waitpid() call in Java's Process class never did wake up. This looks like a possible kernel bug in SLES SP2 as we see bug warnings in the system log that exactly correlate with the process being zombified.

There's really nothing we can do to clean up the zombie. I killed and restarted the parent Agent and almost 18 hours later the zombie is still an undead child of init.

Probably the only thing to do is for Agent to detect the situation and report the process is gone (it really is).

The Process.waitFor() call is just a java wait() on a monitor (that is normally notified when the waitpid() system call returns). It should be possible to interrupt it so the wait thread doesn't leak. Hard to know what might and might not work because the situation can't, in general, be replicated in test.

Jim
On 2/12/14 12:22 PM, Jerry Cwiklik (JIRA) wrote:
Jerry Cwiklik created UIMA-3612:
-----------------------------------

              Summary: DUCC Agent should detect defunct processes
                  Key: UIMA-3612
                  URL: https://issues.apache.org/jira/browse/UIMA-3612
              Project: UIMA
           Issue Type: Bug
           Components: DUCC
     Affects Versions: 1.0-Ducc
             Reporter: Jerry Cwiklik
             Assignee: Jerry Cwiklik


Agent's rogue process detector should change to detect a process that is 
defunct. Its been observed that a process drops core and remains running as 
defunct. Since it is up, the agent is happy and keeps reporting the process as 
Running.

Trying to kill via kill -9 doesnt help. It looks like the defunct process must 
be be cleaned up by root.

Modify code to change the state of such process from Running to Defunct (?).




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to