Tim Landscheidt (2014-05-12 01:58):
...
How do I make sure I get a notification of job failure?
You can tell jsub or more precisely the underlying qsub with
the option "-m e" to send a notification on several occa-
sions (cf. "man qsub"):
| [...]
| -m b|e|a|s|n,...
| Available for qsub, qsh, qrsh, qlogin and
| qalter only.
| Defines or redefines under which circum‐
| stances mail is to be sent to the job owner
| or to the users defined with the -M option
| described below. The option arguments have
| the following meaning:
| `b' Mail is sent at the beginning of the job.
| `e' Mail is sent at the end of the job.
| `a' Mail is sent when the job is aborted or
| rescheduled.
| `s' Mail is sent when the job is suspended.
| `n' No mail is sent.
| Currently no mail is sent when a job is sus‐
| pended.
| [...]
For example, "jsub -mem 10m -m a php" leads to:
| Job 821685 (php5) Aborted
| Exit Status = 139
| Signal = SEGV
| User = tools.wikilint
| Queue = [email protected]
| Host = tools-exec-07.eqiad.wmflabs
| Start Time = 05/11/2014 23:49:56
| End Time = 05/11/2014 23:49:56
| CPU = 00:00:00
| Max vmem = NA
| failed assumedly after job because:
| job 821685.1 died through signal SEGV (11)
NB: "Aborted" means aborted in the grid sense. "jsub -mem
10m -m a false" will not generate a mail for example. So
you might want to use "-m aes" and filter notifications
about successful jobs in your mail client.
Thanks. I've ended up mailing errors manually if the job is not aborted.
Below is my script in case anyone else wishes to use it.
#!/bin/bash
# Functions
# ----------------------------
function sendFailMail {
echo -e "Subject: [job-fail] Something is wrong...\n\n$1" |
/usr/sbin/exim -odf -i [email protected]
}
function checkResult {
local result=$?
if [ "$result" -ne "0" ]; then
local message="[ERROR] exit code from command was non-zero: $result"
echo $message
sendFailMail $message
fi
}
function checkLog {
local logFile=$1
local matchString="(Warning|Error|Notice):"
if grep -Eq $matchString $logFile
then
local message="Log contains errors or something strange..."
message="$message\n\n"`grep -E $matchString $logFile`
echo -e $message
sendFailMail "$message"
fi
}
# Script
# ----------------------------
nowDate=`date +"%Y-%m-%d %H:%M"`
logFile="../dna_refresh-"`date +%Y-%m`".php-out"
cd /data/project/dna/public_html
echo -e "\n\n-----------------------\n$nowDate\n- - - - - - - - - - -
-\n" >> $logFile
php ./index.php &>> $logFile
#checkResult
checkLog $logFile
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l