I am attempting to automatic job pre-emption using the maui
preemptor/premptee queue mechanism with openmpi jobs being pre-empted by
blcr. Almost all of this works, and my thanks are due to Eric Roman of
lbl who has written a procedure cr_mpirun to facilitate the openmpi-blcr
interaction , which he intends to release soon. I believe that the
remaining problem is caused by a maui-torque interaction on which I am
seeking advice here.

 

The problem is that for some jobs, identified below, the job on which a
hold has been placed restarts instantly and is checkpointed again. This
results in the time stamp on the ckpt file not matching what was
expected, so the pre-empted job goes into the W state and the preemptor
cannot start. (The other problem of incomplete .o files is minor by
comparison because the simple workaround  identified will suffice until
the problem is fixed). I have experimented unsuccessfully with a few
modifications to maui, as indicated below. I was hoping for some advice
on what else I might try. I would be interested to know whether the
moab-torque-openmpi-blcr combination is working for anyone.

Thank you. Greg Doherty 

         
        A job asking for m nodes can pre-empt jobs with p<=m nodes, and
restart

        the original jobs. However the .o files of the pre-empted jobs
do not
        contain the output produced prior to them being checkpointed.
For the
        time being, this can be circumvented by redirecting stdout to a
file
        when executing the cr_mpirun command, which works OK.
         
        A job asking for m nodes cannot successfully pre-empt a job
already
        running with p>m nodes. I believe that this is because the
pre-empted
        job restarts immediately so the ckpt files have labels which
don't
        match. I tried to modify MPBSI.c to stop the pre-empted job from
        restarting immediately, by adding a pbs_alterjob between the
pbs_holdjob

        and the pbs_rlsjob to delay the execution of the pre-empted job
by one
        minute, but that simply fails with a pbs_error of 15016
         
        04/18 16:21:35 MRMJobCheckpoint(1272,1,SC)
        04/18 16:21:35 MPBSJobCkpt(1272,R,SC)
        04/18 16:21:37 MPBSJobCkpt(Execution_Time, 1622.37)
        04/18 16:21:37 MPBSJobCkpt(Illegal attribute or resource value
for )
        04/18 16:21:37 ERROR: PBS job '1272.liberty.ansto.gov.au' attr
        'Execution_Time:' to '1622.37' (rc: 15016 'Illegal attribute or
resource

        value for ')
        04/18 16:21:37 INFO:     attribute 'PREEMPTEE' set for job 1272 
         
         
        So, obviously I don't know what I am doing. I have fiddled with
various

        strings to include the month and day when trying to reset the
execution

        time, but to no avail. Probably pbs_alterjob does not want me to
fiddle

        with execution time at all at this point in proceedings. I can't
find
        very much detailed documentation on those attributes. I have
        experimented with short sleep()s between the pbs_holdjob and
pbs_rlsjob

        also to no avail.
         
        I enclose the following in case you can see immediately that I
have done

        something stupid.
        
-------------------------------------------------------------------
         
        int MPBSJobCkpt(
         
          mjob_t  *J,    /* I (modified) */
          mrm_t   *R,    /* I */
          mbool_t  DoTerminateJob, /* I (boolean) */
          char    *Msg,  /* O (optional) */
          int     *SC)   /* O (optional) */
         
          {
          struct attrl Ckattrib;
         
          char          *CkRptr;
          time_t        Cktime;
          struct tm     *Cktmp;
          char          Cktmps[256];
          char          Cktmpline[MAX_MLINE];
         
          Ckattrib.next = NULL;
          Ckattrib.name = ATTR_a;
          Ckattrib.op = SET;
         
          Cktmpline[0] = '\0';
          CkRptr = Cktmpline;
          Ckattrib.resource = CkRptr;
         
          int   rc;
          int   holdtimeout;
         
          char *ErrMsg;
         
          char tmpJobName[MAX_MNAME];
         
          const char *FName = "MPBSJobCkpt";
         
          DBG(2,fPBS) DPrint("%s(%s,R,SC)\n",
            FName,
            (J != NULL) ? J->Name : "NULL");
         
          if ((J == NULL) ||
              (R == NULL) ||
             ((J->State != mjsStarting) && (J->State != mjsRunning)))
            {
            return(FAILURE);
            }
         
          MJobGetName(J,NULL,R,tmpJobName,sizeof(tmpJobName),mjnRMName);
         
          rc =
blocking_pbs_holdjob(R->U.PBS.ServerSD,tmpJobName,"s",NULL);
          /* still ok to release the job if the hold timed out, the
request

was

           * successful.  */
          if (rc != -2) { holdtimeout = 0; } else { holdtimeout = 1; }
         
          if (rc != 0 && !holdtimeout)
            {
            ErrMsg = pbs_geterrmsg(R->U.PBS.ServerSD);
         
            DBG(0,fPBS) DPrint("ERROR:    PBS job '%s' cannot be
checkpointed
        (rc: %d  '%s')\n",
              J->Name,
              rc,
              ErrMsg);
         
            if (R->FailIteration != MSched.Iteration)
              {
              R->FailIteration = MSched.Iteration;
              R->FailCount     = 0;
              }
         
            R->FailCount++;
         
            return(FAILURE);
            }
         
          for (rc=0; rc<256; rc++) {
                Cktmps[rc] = '\0';
          }
         
          Cktime = time(NULL);
          Cktime += 60;
          Cktmp = localtime(&Cktime);
         
          if (strftime(Cktmps, sizeof(Cktmps), "%m%d%H%M.%S", Cktmp) ==
0) {
            DBG(0,fPBS) DPrint("ERROR: Greg's checkpoint addition %d
\n",
        Cktime);
            return(FAILURE);
          }
          Ckattrib.value = Cktmps;
          DBG(2,fPBS) DPrint("%s(%s, %s)\n",
            FName, Ckattrib.name, Ckattrib.value);
         
          rc = pbs_alterjob(R->U.PBS.ServerSD, tmpJobName, &Ckattrib,
NULL);
            ErrMsg = pbs_geterrmsg(R->U.PBS.ServerSD);
          DBG(2,fPBS) DPrint("%s(%s)\n",
            FName, ErrMsg);
         
          if (rc != 0)
            {
            ErrMsg = pbs_geterrmsg(R->U.PBS.ServerSD);
         
            DBG(2,fPBS) DPrint("ERROR: PBS job '%s' attr '%s:%s' to '%s'
(rc: %d

        '%s')\n",
              tmpJobName,
              Ckattrib.name,
              Ckattrib.resource,
              Ckattrib.value,
              rc,
              ErrMsg);
        /* If I do not comment this bit out, maui simply stops of course
           and I do not even get to see all the debug messages in the
log

file.

         
            if (R->FailIteration != MSched.Iteration)
              {
              R->FailIteration = MSched.Iteration;
              R->FailCount     = 0;
              }
         
            R->FailCount++;
         
            return(FAILURE);
        */
            }
         
          rc = pbs_rlsjob(R->U.PBS.ServerSD,tmpJobName,"s",NULL);
         
          if (rc != 0)
            {
            ErrMsg = pbs_geterrmsg(R->U.PBS.ServerSD);
         
            DBG(0,fPBS) DPrint("ERROR:    PBS job '%s' cannot be
released from
        hold (rc: %d  '%s')\n",
              J->Name,
              rc,
              ErrMsg);
         
            if (R->FailIteration != MSched.Iteration)
              {
              R->FailIteration = MSched.Iteration;
              R->FailCount     = 0;
              }
         
            R->FailCount++;
         
            return(FAILURE);
            }
         
          if (holdtimeout) { return(FAILURE); }
         
          /* NOTE:  'DoTerminateJob' flag not supported */
         
          DBG(7,fPBS) DPrint("INFO:     job '%s' checkpointed\n",
            J->Name);
         
          return(SUCCESS);
          }  /* END MPBSJobCkpt() */

 

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to