Thanks for the pointer Charles.  Indeed, the job IDs didn't match.
After digging more into the logs and code, I noticed that this was
caused by the following:

- Globus checks for the 'condorness' of a local resource manager in
ManagedExecutableJobResource.java.  This checks the name of the resource
manager to see if it matches
ManagedJobFactoryConstants.FACTORY_TYPE.CONDOR.  Since my new local
resource manager is named 'foo', the above check fails and because of
that the 'emitCondorProcesses' attribute is never set.

- Since the emitCondorProcesses is not set, this affects the following
piece of code in foo.in:

        if($job_id ne '')
        {
            $status = Globus::GRAM::JobState::PENDING;

            if ($description->emit_condor_processes()) {
                $job_id = join(',', map { sprintf("%03d.%03d.%03d",
                            $job_id, $_, 0) }
(0..($description->count()-1)));
            }
            return {JOB_STATE => Globus::GRAM::JobState::PENDING,
                JOB_ID    => $job_id};
        }

Basically, the job ID will not be transformed into the appropriate
condor format, so the job state notifications will not work.

To fix this, I simply removed the condition in the code above,
re-installed my foo job manager and job notifications are working fine now.


Thanks again for the feedback.

Regards,
  Andre

Charles Bacon wrote:
> Are you sure that the Job IDs referenced in the SEG output (looks like
> xxx.yyy.zzz) match the Job IDs that WS-GRAM thinks it has gotten back
> from the perl jobmanagers?  I've done one of these second-condor
> jobmanagers before for OSG's ManagedFork jobmanager, and there was some
> problem where the scripts were reporting xxx.0, but the SEG was
> reporting on xxx.000.000.  WS-GRAM won't realize that those are supposed
> to be the same, so you can either modify the behavior of your foo.pm or
> your SEG so they match up.  If it's not obvious at your current level of
> logging, bump the GRAM logging up to DEBUG in the
> container-log4j.properties file.
> 
> 
> 
> Charles
> 
> On Feb 16, 2009, at 11:28 AM, Andre Charbonneau wrote:
> 
>> Greetings,
>> I'm currently trying to deploy a new job manager and scheduler event
>> generator and I'm having some problems.
>> Basically, what I am trying to do is to have a second condor job manager
>> and scheduler interface and SEG module, but with a different name (foo).
>> To get started, I simply cloned the code from the existing condor job
>> manager, scheduler provider and SEG module and changed the names in the
>> various files to refer to 'foo' instead of 'condor'.
>>
>> So far, I'm able to submit my job and the job runs to completion.  The
>> problem I'm having is that the globusrun-ws client does not seem to get
>> any notifications, even though my SEG module seem to be working fine.
>> It simply waits forever after I submit my job.  For example:
>>
>> globusrun-ws -submit -s -Ft Foo -Jf creds.epr -Sf creds.epr -Tf
>> creds.epr -F ******* -f myjob.xml
>> Submitting job...Done.
>> Job ID: uuid:f4df4ba2-fc4d-11dd-8f66-00b0d0e1435d
>> Termination time: 02/17/2009 17:19 GMT
>> Current job state: Unsubmitted
>>
>>
>>
>> I checked if my SEG module is running and it looks ok:
>>
>> ps -ef |grep globus-scheduler-event-generator
>> globus   26288 26229  0 11:53 ?        00:00:00
>> /usr/local/globus/libexec/globus-scheduler-event-generator -s foo -t
>> 1234802181
>> globus   26300 26229  0 11:53 ?        00:00:00
>> /usr/local/globus/libexec/globus-scheduler-event-generator -s fork -t
>> 1234801181
>> globus   26744 26229  0 12:10 ?        00:00:00
>> /usr/local/globus/libexec/globus-scheduler-event-generator -s condor -t
>> 1234803725
>> globus   26748  2895  0 12:10 pts/0    00:00:00 grep
>> globus-scheduler-event-generator
>>
>>
>>
>>
>> And when I run it by hand, it looks like it is behaving OK too:
>>
>> 001;1234804765;098.000.000;1;0
>> 001;1234804806;098.000.000;2;0
>> 001;1234804823;098.000.000;8;0
>>
>>
>> I've compared my code with the one from the Condor job manager and I
>> can't find what I'm missing.
>>
>> Anyone one else had similar issues when deploying their custom made job
>> managers?  Anything else than the SEG module that is required for the
>> job state notifications to be properly sent to the client?
>>
>> (I'm using gt 4.0.8)
>>
>>
>> Thanks,
>>     Andre


-- 
Andre Charbonneau
Research Computing Support, IMSB
National Research Council Canada
100 Sussex Drive, Rm 2025
Ottawa, ON, Canada K1A 0R6
613 993-3129

Reply via email to