Which tools are the offending tools? Cyberpower678 English Wikipedia Account Creation Team English Wikipedia Administrator Global User Renamer
> On Aug 2, 2017, at 13:04, Bryan Davis <bd...@wikimedia.org> wrote: > > We saw a big spike of active Grid Engine jobs starting around > 2017-08-01T00:00. I've been looking at the list of active jobs and > noticed that several tools had a lot of copies of the same job > running. There are tools that are designed to have several copies of > the same job running working from a shared queue of some sort, but > often this is a sign that something is wrong with the script. > > Here's fancy shell pipeline that will give you a list of all of your > tool's running jobs grouped by job name and sorted by start time: > > qstat -xml | > tr '\n' ' ' | > sed 's#<job_list[^>]*>#\n#g' | > sed 's#<[^>]*>##g' | > grep " " | > column -t | > awk 'BEGIN { OFS="\t" } {print $1, $3, $6, $5}' | > sort -n -k 3|sort -s -k 2 > > You can use this to see if you have parallel jobs running and if so > when the "stuck" jobs started. It seems that there may have been some > database related events happening between 2017-07-31T23:00 and > 2017-08-01T06:00 that left a bunch of jobs stuck in a bad state > internally. > > To keep your cron scheduled jobs from running in parallel, you can add > the `-once` flag to your crontab. Either `jsub -once ...` or `qcronsub > ...` will do this for you. When the once flag is active, jsub and > qcronsub will look for jobs that your tool is already running and if > there is an active job with the same name then the new job will *not* > be started and an error message will be logged. The name is either > provided explicitly with `-N ....` or automatically added based on the > command if -N is not used. > > (This should probably end up on wikitech in the help somewhere...) > > Bryan > -- > Bryan Davis Wikimedia Foundation <bd...@wikimedia.org> > [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA > irc: bd808 v:415.839.6885 x6855 > > _______________________________________________ > Labs-l mailing list > Labs-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/labs-l
_______________________________________________ Labs-l mailing list Labs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/labs-l