Which tools are the offending tools?

Cyberpower678
English Wikipedia Account Creation Team
English Wikipedia Administrator
Global User Renamer

> On Aug 2, 2017, at 13:04, Bryan Davis <bd...@wikimedia.org> wrote:
> 
> We saw a big spike of active Grid Engine jobs starting around
> 2017-08-01T00:00. I've been looking at the list of active jobs and
> noticed that several tools had a lot of copies of the same job
> running. There are tools that are designed to have several copies of
> the same job running working from a shared queue of some sort, but
> often this is a sign that something is wrong with the script.
> 
> Here's fancy shell pipeline that will give you a list of all of your
> tool's running jobs grouped by job name and sorted by start time:
> 
>  qstat -xml |
>  tr '\n' ' ' |
>  sed 's#<job_list[^>]*>#\n#g' |
>  sed 's#<[^>]*>##g' |
>  grep " " |
>  column -t |
>  awk 'BEGIN { OFS="\t" } {print $1, $3, $6, $5}' |
>  sort -n -k 3|sort -s -k 2
> 
> You can use this to see if you have parallel jobs running and if so
> when the "stuck" jobs started. It seems that there may have been some
> database related events happening between 2017-07-31T23:00 and
> 2017-08-01T06:00 that left a bunch of jobs stuck in a bad state
> internally.
> 
> To keep your cron scheduled jobs from running in parallel, you can add
> the `-once` flag to your crontab. Either `jsub -once ...` or `qcronsub
> ...` will do this for you. When the once flag is active, jsub and
> qcronsub will look for jobs that your tool is already running and if
> there is an active job with the same name then the new job will *not*
> be started and an error message will be logged. The name is either
> provided explicitly with `-N ....` or automatically added based on the
> command if -N is not used.
> 
> (This should probably end up on wikitech in the help somewhere...)
> 
> Bryan
> -- 
> Bryan Davis              Wikimedia Foundation    <bd...@wikimedia.org>
> [[m:User:BDavis_(WMF)]] Manager, Cloud Services          Boise, ID USA
> irc: bd808                                        v:415.839.6885 x6855
> 
> _______________________________________________
> Labs-l mailing list
> Labs-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l

_______________________________________________
Labs-l mailing list
Labs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to