Hoo man has uploaded a new change for review. ( https://gerrit.wikimedia.org/r/393923 )
Change subject: Fix killing dumpers in Wikidata entity dumpers ...................................................................... Fix killing dumpers in Wikidata entity dumpers We currently assume that process group id = parent process id when trying to kill all Wikidata dump shards for a restart. While this often holds, it doesn't when the bash script is run via sudo/ cron, as I discovered. Also the PHP processes spawned are not part of this PGID at all. To fix this, we need to recursively find all sub*processes and then kill them. I wasn't able to come up with a nicer way to do this and stackoverflow also didn't bring up much useful here: https://stackoverflow.com/questions/392022/best-way-to-kill-all-child-processes Change-Id: I391e64ed24a05741874a9a533011b954ac5e4e11 --- M modules/snapshot/files/cron/dumpwikidatajson.sh M modules/snapshot/files/cron/dumpwikidatardf.sh M modules/snapshot/files/cron/wikidatadumps-shared.sh 3 files changed, 20 insertions(+), 4 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/operations/puppet refs/changes/23/393923/1 diff --git a/modules/snapshot/files/cron/dumpwikidatajson.sh b/modules/snapshot/files/cron/dumpwikidatajson.sh index 136e851..e3f7383 100644 --- a/modules/snapshot/files/cron/dumpwikidatajson.sh +++ b/modules/snapshot/files/cron/dumpwikidatajson.sh @@ -36,8 +36,8 @@ echo -e "\n\n(`date --iso-8601=minutes`) Process for shard $i failed with exit code $exitCode" >> $errorLog echo 1 > $failureFile - # Kill all remaining dumpers and start over. - kill -- -$$ + # Kill all sub*-processes of the (parent) bash process and start over + killAllSubProcesses fi ) & let i++ diff --git a/modules/snapshot/files/cron/dumpwikidatardf.sh b/modules/snapshot/files/cron/dumpwikidatardf.sh index 8c3a22d..17efe5e 100755 --- a/modules/snapshot/files/cron/dumpwikidatardf.sh +++ b/modules/snapshot/files/cron/dumpwikidatardf.sh @@ -63,8 +63,8 @@ echo -e "\n\n(`date --iso-8601=minutes`) Process for shard $i failed with exit code $exitCode" >> $errorLog echo 1 > $failureFile - # Kill all remaining dumpers and start over. - kill -- -$$ + # Kill all sub*-processes of the (parent) bash process and start over + killAllSubProcesses fi ) & let i++ diff --git a/modules/snapshot/files/cron/wikidatadumps-shared.sh b/modules/snapshot/files/cron/wikidatadumps-shared.sh index 1cd6f1d..e0c4e29 100644 --- a/modules/snapshot/files/cron/wikidatadumps-shared.sh +++ b/modules/snapshot/files/cron/wikidatadumps-shared.sh @@ -60,3 +60,19 @@ function runDcat { php /usr/local/share/dcat/DCAT.php --config=/usr/local/etc/dcatconfig.json --dumpDir=$targetDirBase --outputDir=$targetDirBase } + +function killAllSubProcesses { + # Get direct subprocess + PIDS=`pgrep -P $$` + # Get indirect subprocesses (duplicates are ok here) + PIDS="$PIDS $(pgrep -P `echo $PIDS | sed 's/ /,/g'`)" + PIDS="$PIDS $(pgrep -P `echo $PIDS | sed 's/ /,/g'`)" + PIDS="$PIDS $(pgrep -P `echo $PIDS | sed 's/ /,/g'`)" + + # Make sure we're not killing ourselves + PIDS=`echo $PIDS | sed "s/\b$$\b//g"` + + # Use nohup here to make sure the kill is not getting stopped itself + # when invoked in a subshell + nohup kill $PIDS > /dev/null +} -- To view, visit https://gerrit.wikimedia.org/r/393923 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I391e64ed24a05741874a9a533011b954ac5e4e11 Gerrit-PatchSet: 1 Gerrit-Project: operations/puppet Gerrit-Branch: production Gerrit-Owner: Hoo man <[email protected]> _______________________________________________ MediaWiki-commits mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
