Hello, What I have seen as well is that if I send a job to the fork jobmanager and to my jobmanager I get the same logs in the gram_job_mgr files. For the fork jobmanager it works, while for mine it does not: * # tail /home/atlas082/gram_job_mgr_6565.log 7/29 09:30:35 JMI: completed script validation: job manager type is mmaa. 7/29 09:30:35 JMI: cmd = cache_cleanup 7/29 09:30:35 Job Manager State Machine (entering): GLOBUS_GRAM_JOB_MANAGER_STATE_EARLY_FAILED_CACHE_CLEAN_UP 7/29 09:30:35 Job Manager State Machine (entering): GLOBUS_GRAM_JOB_MANAGER_STATE_EARLY_FAILED_RESPONSE 7/29 09:30:35 JM: before sending to client: rc=0 (Success) 7/29 09:30:35 Job Manager State Machine (exiting): GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_DONE 7/29 09:30:35 JM: in globus_gram_job_manager_reporting_file_remove() 7/29 09:30:35 Job Manager State Machine (entering): GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_DONE 7/29 09:30:35 JM: in globus_gram_job_manager_reporting_file_remove() 7/29 09:30:35 JM: exiting globus_gram_job_manager. # tail /home/atlas082/gram_job_mgr_6800.log 7/29 09:32:36 JMI: completed script validation: job manager type is mmaa. 7/29 09:32:36 JMI: cmd = cache_cleanup 7/29 09:32:36 Job Manager State Machine (entering): GLOBUS_GRAM_JOB_MANAGER_STATE_EARLY_FAILED_CACHE_CLEAN_UP 7/29 09:32:36 Job Manager State Machine (entering): GLOBUS_GRAM_JOB_MANAGER_STATE_EARLY_FAILED_RESPONSE 7/29 09:32:36 JM: before sending to client: rc=0 (Success) 7/29 09:32:36 Job Manager State Machine (exiting): GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_DONE 7/29 09:32:36 JM: in globus_gram_job_manager_reporting_file_remove() 7/29 09:32:36 Job Manager State Machine (entering): GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_DONE 7/29 09:32:36 JM: in globus_gram_job_manager_reporting_file_remove() 7/29 09:32:36 JM: exiting globus_gram_job_manager. * *#globus-job-run myce/jobmanager-fork /usr/bin/whoami atlas082 #globus-job-run myce/jobmanager-mmaa /usr/bin/whoami GRAM Job submission failed because the job manager detected an invalid script response (error code 24) * Any ideas? Thanks so much! Carlos
On Wed, Jul 28, 2010 at 8:57 AM, Carlos Borrego Iglesias <[email protected]>wrote: > Thanks so much Joseph, > Yes, I have tried to manually run it and it works: > > # perl -I /opt/globus/lib/perl /opt/globus/lib/perl/Globus/GRAM/JobManager/ > mmaa.pm > # echo $? > 0 > > I am running globus-version 4.0.3, so I presume that I can not add the rsl > relation save_job_description = yes to the job you are commenting: > > # globus-version > 4.0.3 > > Thanks so much for any other hint that could help me > Cheers > Carlos > > > > > > On Wed, Jul 28, 2010 at 1:23 AM, Joseph Bester <[email protected]> wrote: > >> On Jul 27, 2010, at 9:13 AM, Carlos Borrego Iglesias wrote: >> >> > Hello, >> > I am trying to define a new globus-job-manager called jobmanager-mmaa: >> > >> > when I do a globus-job-run I get the next message: >> > >> > #globus-job-run myce.pic.es/jobmanager-mmaa /bin/hostname >> > GRAM Job submission failed because the job manager detected an invalid >> script response (error code 24) >> > >> > In the gatekeeper log I see no errors: >> > >> >> Does this mean you are creating a new LRM script? >> >> If so, the first step of debugging is to make sure it runs with >> >> perl -I$GLOBUS_LOCATION/lib/perl >> $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/mmaa.pm >> >> (assuming mmaa.pm is your module name) >> >> After that, if you are using 5.0.2, you can add the rsl relation >> save_job_description = yes to a job >> submission. This will leave a perl file in your home directory that you >> can run with the job manager script. That file is called >> gram_$unique.plwhere $unique is a string of characters unique for each job. >> Pass that to >> the script to see what's going on: >> >> $GLOBUS_LOCATION/libexec/globus-job-manager-script.pl -m mmaa -f >> ~/gram_UNIQUE.pl -c submit >> >> >> > Successfull mapping done >> > Mapping service "LCMAPS" returned local user "atlas007" >> > TIME: Tue Jul 27 15:08:22 2010 >> > PID: 26517 -- Notice: 0: GRID_SECURITY_HTTP_BODY_FD=9 >> > TIME: Tue Jul 27 15:08:22 2010 >> > PID: 26517 -- Notice: 5: Requested service: jobmanager-mmaa >> > TIME: Tue Jul 27 15:08:22 2010 >> > PID: 26517 -- Notice: 5: Authorized as local user: atlas007 >> > TIME: Tue Jul 27 15:08:22 2010 >> > PID: 26517 -- Notice: 5: Authorized as local uid: 31057 >> > TIME: Tue Jul 27 15:08:22 2010 >> > PID: 26517 -- Notice: 5: and local gid: 1307 >> > TIME: Tue Jul 27 15:08:22 2010 >> > PID: 26517 -- Notice: 5: "/DC=es/DC=irisgrid/O=ifae/CN=carlos.borrego" >> mapped to atlas007 (31057/1307) >> > TIME: Tue Jul 27 15:08:22 2010 >> > PID: 26517 -- Notice: 0: executing >> /opt/globus//libexec/globus-job-manager >> > TIME: Tue Jul 27 15:08:22 2010 >> > PID: 26517 -- Notice: 0: GATEKEEPER_JM_ID >> 2010-07-27.15:08:22.0000026517.0000000000 for >> /DC=es/DC=irisgrid/O=ifae/CN=carlos.borrego on 193.109.175.133 >> > JMA 2010/07/27 15:08:22 GATEKEEPER_JM_ID >> 2010-07-27.15:08:22.0000026517.0000000000 has EDG_WL_JOBID '' >> > GATEKEEPER_DGAS_FD=8 >> (/opt/edg/var/gatekeeper/jobs/2010-07-27.15:08:22.0000014805.0000000014) >> > TIME: Tue Jul 27 15:08:22 2010 >> > PID: 26517 -- Notice: 0: GRID_SECURITY_CONTEXT_FD=12 >> > TIME: Tue Jul 27 15:08:22 2010 >> > PID: 26517 -- Notice: 0: Child 26519 started >> > JMA 2010/07/27 15:08:22 GATEKEEPER_JM_ID >> 2010-07-27.15:08:22.0000026517.0000000000 JM exiting >> > >> > In the gram job manager log file from the user which is mapped to I get: >> > >> > [atlas...@myce ~]$ tail gram_job_mgr_22248.log >> > 7/27 14:33:07 JMI: completed script validation: job manager type is >> mmaa. >> > 7/27 14:33:07 JMI: cmd = cache_cleanup >> > 7/27 14:33:07 Job Manager State Machine (entering): >> GLOBUS_GRAM_JOB_MANAGER_STATE_EARLY_FAILED_CACHE_CLEAN_UP >> > 7/27 14:33:07 Job Manager State Machine (entering): >> GLOBUS_GRAM_JOB_MANAGER_STATE_EARLY_FAILED_RESPONSE >> > 7/27 14:33:07 JM: before sending to client: rc=0 (Success) >> > 7/27 14:33:07 Job Manager State Machine (exiting): >> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_DONE >> > 7/27 14:33:07 JM: in globus_gram_job_manager_reporting_file_remove() >> > 7/27 14:33:07 Job Manager State Machine (entering): >> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_DONE >> > 7/27 14:33:07 JM: in globus_gram_job_manager_reporting_file_remove() >> > 7/27 14:33:07 JM: exiting globus_gram_job_manager. >> > >> > Any ideas where can I get more debug info? >> > Thanks so much in advance >> > Carlos >> > >> > -- >> > ============================= >> > Carlos Borrego Iglesias >> > [email protected] >> > IFAE Institut de Física d'Altes Energies >> > Campus UAB Edifici Cn. Facultat Ciències >> > E-08193 Bellaterra >> > tel: +34 93 581 2822 >> > ============================= >> >> > > > -- > ============================= > Carlos Borrego Iglesias > [email protected] > IFAE Institut de Física d'Altes Energies > Campus UAB Edifici Cn. Facultat Ciències > E-08193 Bellaterra > tel: +34 93 581 2822 > ============================= > -- ============================= Carlos Borrego Iglesias [email protected] IFAE Institut de Física d'Altes Energies Campus UAB Edifici Cn. Facultat Ciències E-08193 Bellaterra tel: +34 93 581 2822 =============================
