[
https://issues.apache.org/jira/browse/SOLR-15558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Drob resolved SOLR-15558.
------------------------------
Fix Version/s: 9.0
8.11.2
Resolution: Fixed
> Solr stop doesn't handle zombie processes
> -----------------------------------------
>
> Key: SOLR-15558
> URL: https://issues.apache.org/jira/browse/SOLR-15558
> Project: Solr
> Issue Type: Bug
> Reporter: Colvin Cowie
> Assignee: Mike Drob
> Priority: Trivial
> Fix For: 9.0, 8.11.2
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> When calling solr stop on linux, this command is used
> _CHECK_PID=`ps auxww | awk '\{print $2}' | grep -w $SOLR_PID | sort -r | tr
> -d ' '`_
> [https://github.com/apache/solr/blob/122c88a0748769432ef62cc3fb94c2226dd67aa7/solr/bin/solr#L871]
>
> If Solr has stopped but remains as a zombie process then its process entry
> will remain in the table, so _ps auxww_ will continue to show the PID even
> after kill -9. So that results in something like this, with 3 minutes wasted
> waiting for a dead process to exit.
>
> _[2021-07-21T09:15:12.365Z] Sending stop command to Solr running on port 8983
> ... waiting up to 180 seconds to allow Jetty process 12622 to stop
> gracefully._
> _[2021-07-21T09:18:13.551Z] [|] Solr process 12622 is still running;
> jstacking it now._
> _[2021-07-21T09:18:21.806Z] 12622: Unable to open socket file
> /proc/12622/root/tmp/.java_pid12622: target process 12622 doesn't respond
> within 10500ms or HotSpot VM not loaded_
> _[2021-07-21T09:18:21.806Z] Solr process 12622 is still running; forcefully
> killing it now._
> _[2021-07-21T09:18:21.806Z] Killed process 12622_
> _[2021-07-21T09:18:31.678Z] ERROR: Failed to kill previous Solr Java process
> 12622 ... script fails._
>
> But the output of ps auxww does identify Zombie processes under STAT:
> _USER PID %CPU %MEM VSZ RSS TTY *STAT* START TIME COMMAND_
> _root 12622 1.4 0.0 0 0 pts/1 *Z*
> 10:42 0:26 [java] *<defunct>* _
>
> So the CHECK_PID could filter out Zombies.
> Obviously the bigger issue is why the process has ended up as a Zombie (in
> this case it was because of
> [https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/]
> and not specifying "--init" when running Solr inside a docker container) so
> maybe a message warning that the process is a zombie is worth having, so that
> the user has an opportunity to do something about it.
>
> Note from [~mdrob]
> {quote}That seems like a reasonable check to add, the only caution I would
> advise
> is that a lot of developers use macs for local testing so make sure that
> whatever flags you invoke are generally cross platform compatible, or
> hidden behind appropriate conditions.{quote}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]