Colvin Cowie created SOLR-15558:
-----------------------------------

             Summary: Solr stop doesn't handle zombie processes
                 Key: SOLR-15558
                 URL: https://issues.apache.org/jira/browse/SOLR-15558
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Colvin Cowie


When calling solr stop on linux, this command is used
_CHECK_PID=`ps auxww | awk '\{print $2}' | grep -w $SOLR_PID | sort -r | tr -d 
' '`_
[https://github.com/apache/solr/blob/122c88a0748769432ef62cc3fb94c2226dd67aa7/solr/bin/solr#L871]
 
If Solr has stopped but remains as a zombie process then its process entry will 
remain in the table, so _ps auxww_ will continue to show the PID even after 
kill -9. So that results in something like this, with 3 minutes wasted waiting 
for a dead process to exit.
 
_[2021-07-21T09:15:12.365Z] Sending stop command to Solr running on port 8983 
... waiting up to 180 seconds to allow Jetty process 12622 to stop gracefully._
_[2021-07-21T09:18:13.551Z]  [|] Solr process 12622 is still running; jstacking 
it now._
_[2021-07-21T09:18:21.806Z] 12622: Unable to open socket file 
/proc/12622/root/tmp/.java_pid12622: target process 12622 doesn't respond 
within 10500ms or HotSpot VM not loaded_
_[2021-07-21T09:18:21.806Z] Solr process 12622 is still running; forcefully 
killing it now._
_[2021-07-21T09:18:21.806Z] Killed process 12622_
_[2021-07-21T09:18:31.678Z] ERROR: Failed to kill previous Solr Java process 
12622 ... script fails._
 
But the output of ps auxww does identify Zombie processes under STAT:
_USER       PID %CPU %MEM    VSZ   RSS TTY      *STAT* START   TIME COMMAND_
_root          12622  1.4  0.0              0     0       pts/1     *Z*    
10:42   0:26 [java] *<defunct>*  _ 
 
So the CHECK_PID could filter out Zombies.
Obviously the bigger issue is why the process has ended up as a Zombie (in this 
case it was because of 
[https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/]
 and not specifying "--init" when running Solr inside a docker container) so 
maybe a message warning that the process is a zombie is worth having, so that 
the user has an opportunity to do something about it.
 
Note from [~mdrob]
{quote}That seems like a reasonable check to add, the only caution I would 
advise
is that a lot of developers use macs for local testing so make sure that
whatever flags you invoke are generally cross platform compatible, or
hidden behind appropriate conditions.{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to