I am attempting to automatic job pre-emption using the maui
preemptor/premptee queue mechanism with openmpi jobs being pre-empted by
blcr. Almost all of this works, and my thanks are due to Eric Roman of
lbl who has written a procedure cr_mpirun to facilitate the openmpi-blcr
interaction , which he intends to release soon. I believe that the
remaining problem is caused by a maui-torque interaction on which I am
seeking advice here.
The problem is that for some jobs, identified below, the job on which a
hold has been placed restarts instantly and is checkpointed again. This
results in the time stamp on the ckpt file not matching what was
expected, so the pre-empted job goes into the W state and the preemptor
cannot start. (The other problem of incomplete .o files is minor by
comparison because the simple workaround identified will suffice until
the problem is fixed). I have experimented unsuccessfully with a few
modifications to maui, as indicated below. I was hoping for some advice
on what else I might try. I would be interested to know whether the
moab-torque-openmpi-blcr combination is working for anyone.
Thank you. Greg Doherty
A job asking for m nodes can pre-empt jobs with p<=m nodes, and
restart
the original jobs. However the .o files of the pre-empted jobs
do not
contain the output produced prior to them being checkpointed.
For the
time being, this can be circumvented by redirecting stdout to a
file
when executing the cr_mpirun command, which works OK.
A job asking for m nodes cannot successfully pre-empt a job
already
running with p>m nodes. I believe that this is because the
pre-empted
job restarts immediately so the ckpt files have labels which
don't
match. I tried to modify MPBSI.c to stop the pre-empted job from
restarting immediately, by adding a pbs_alterjob between the
pbs_holdjob
and the pbs_rlsjob to delay the execution of the pre-empted job
by one
minute, but that simply fails with a pbs_error of 15016
04/18 16:21:35 MRMJobCheckpoint(1272,1,SC)
04/18 16:21:35 MPBSJobCkpt(1272,R,SC)
04/18 16:21:37 MPBSJobCkpt(Execution_Time, 1622.37)
04/18 16:21:37 MPBSJobCkpt(Illegal attribute or resource value
for )
04/18 16:21:37 ERROR: PBS job '1272.liberty.ansto.gov.au' attr
'Execution_Time:' to '1622.37' (rc: 15016 'Illegal attribute or
resource
value for ')
04/18 16:21:37 INFO: attribute 'PREEMPTEE' set for job 1272
So, obviously I don't know what I am doing. I have fiddled with
various
strings to include the month and day when trying to reset the
execution
time, but to no avail. Probably pbs_alterjob does not want me to
fiddle
with execution time at all at this point in proceedings. I can't
find
very much detailed documentation on those attributes. I have
experimented with short sleep()s between the pbs_holdjob and
pbs_rlsjob
also to no avail.
I enclose the following in case you can see immediately that I
have done
something stupid.
-------------------------------------------------------------------
int MPBSJobCkpt(
mjob_t *J, /* I (modified) */
mrm_t *R, /* I */
mbool_t DoTerminateJob, /* I (boolean) */
char *Msg, /* O (optional) */
int *SC) /* O (optional) */
{
struct attrl Ckattrib;
char *CkRptr;
time_t Cktime;
struct tm *Cktmp;
char Cktmps[256];
char Cktmpline[MAX_MLINE];
Ckattrib.next = NULL;
Ckattrib.name = ATTR_a;
Ckattrib.op = SET;
Cktmpline[0] = '\0';
CkRptr = Cktmpline;
Ckattrib.resource = CkRptr;
int rc;
int holdtimeout;
char *ErrMsg;
char tmpJobName[MAX_MNAME];
const char *FName = "MPBSJobCkpt";
DBG(2,fPBS) DPrint("%s(%s,R,SC)\n",
FName,
(J != NULL) ? J->Name : "NULL");
if ((J == NULL) ||
(R == NULL) ||
((J->State != mjsStarting) && (J->State != mjsRunning)))
{
return(FAILURE);
}
MJobGetName(J,NULL,R,tmpJobName,sizeof(tmpJobName),mjnRMName);
rc =
blocking_pbs_holdjob(R->U.PBS.ServerSD,tmpJobName,"s",NULL);
/* still ok to release the job if the hold timed out, the
request
was
* successful. */
if (rc != -2) { holdtimeout = 0; } else { holdtimeout = 1; }
if (rc != 0 && !holdtimeout)
{
ErrMsg = pbs_geterrmsg(R->U.PBS.ServerSD);
DBG(0,fPBS) DPrint("ERROR: PBS job '%s' cannot be
checkpointed
(rc: %d '%s')\n",
J->Name,
rc,
ErrMsg);
if (R->FailIteration != MSched.Iteration)
{
R->FailIteration = MSched.Iteration;
R->FailCount = 0;
}
R->FailCount++;
return(FAILURE);
}
for (rc=0; rc<256; rc++) {
Cktmps[rc] = '\0';
}
Cktime = time(NULL);
Cktime += 60;
Cktmp = localtime(&Cktime);
if (strftime(Cktmps, sizeof(Cktmps), "%m%d%H%M.%S", Cktmp) ==
0) {
DBG(0,fPBS) DPrint("ERROR: Greg's checkpoint addition %d
\n",
Cktime);
return(FAILURE);
}
Ckattrib.value = Cktmps;
DBG(2,fPBS) DPrint("%s(%s, %s)\n",
FName, Ckattrib.name, Ckattrib.value);
rc = pbs_alterjob(R->U.PBS.ServerSD, tmpJobName, &Ckattrib,
NULL);
ErrMsg = pbs_geterrmsg(R->U.PBS.ServerSD);
DBG(2,fPBS) DPrint("%s(%s)\n",
FName, ErrMsg);
if (rc != 0)
{
ErrMsg = pbs_geterrmsg(R->U.PBS.ServerSD);
DBG(2,fPBS) DPrint("ERROR: PBS job '%s' attr '%s:%s' to '%s'
(rc: %d
'%s')\n",
tmpJobName,
Ckattrib.name,
Ckattrib.resource,
Ckattrib.value,
rc,
ErrMsg);
/* If I do not comment this bit out, maui simply stops of course
and I do not even get to see all the debug messages in the
log
file.
if (R->FailIteration != MSched.Iteration)
{
R->FailIteration = MSched.Iteration;
R->FailCount = 0;
}
R->FailCount++;
return(FAILURE);
*/
}
rc = pbs_rlsjob(R->U.PBS.ServerSD,tmpJobName,"s",NULL);
if (rc != 0)
{
ErrMsg = pbs_geterrmsg(R->U.PBS.ServerSD);
DBG(0,fPBS) DPrint("ERROR: PBS job '%s' cannot be
released from
hold (rc: %d '%s')\n",
J->Name,
rc,
ErrMsg);
if (R->FailIteration != MSched.Iteration)
{
R->FailIteration = MSched.Iteration;
R->FailCount = 0;
}
R->FailCount++;
return(FAILURE);
}
if (holdtimeout) { return(FAILURE); }
/* NOTE: 'DoTerminateJob' flag not supported */
DBG(7,fPBS) DPrint("INFO: job '%s' checkpointed\n",
J->Name);
return(SUCCESS);
} /* END MPBSJobCkpt() */
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers