Dear all,

I could not check this before as machines are heavily used. I came to
the conclusion that something is wrong with GT 4.0.5. Please correct me
if I am wrong. We have two types of servers: romeo (IA-64, Globus 4.0.2,
and Java 1.4.2) and hector (X86-64, Globus 4.0.4, and Java 1.6). Both
are running nicely with old versions of GT. Both have the same entries
in their sudo file:

#Globus GRAM entries
globus  ALL=(ALL,!root) NOPASSWD:
/opt/globus/libexec/globus-gridmap-and-execute -g
/etc/grid-security/grid-mapfile
/opt/globus/libexec/globus-job-manager-script.pl *
globus  ALL=(ALL,!root) NOPASSWD:
/opt/globus/libexec/globus-gridmap-and-execute -g
/etc/grid-security/grid-mapfile
/opt/globus/libexec/globus-gram-local-proxy-tool *

The GLOBUS_LOCATION is /opt/globus which is a symlink. The sudoer have
no problems with symlink in the older versions. Now, I compiled and
installed GT 4.0.5 into a different directory. Changed the symlink to
the new location. Everything (gsissh, GridFTP, and RFT) is working
except job submission. Here is what you see at the client side:

[EMAIL PROTECTED]:~> globusrun-ws -submit -S -F
https://romeo.urz.tu-dresden.de:8443/wsrf/services/ManagedJobFactoryService
-s -c /bin/date
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:2cc1e60a-70be-11dc-b9e1-080069149999
Termination time: 10/03/2007 08:05 GMT
Current job state: Failed
Destroying job...Done.
Cleaning up any delegated credentials...Done.
globusrun-ws: Job failed: Error code: 200
Sudo is misconfigured to run the globus-job-manager-script.pl script for
user zimd0022.

[EMAIL PROTECTED]:~>

*The entries in romeo's container file are below: *

2007-10-02 10:05:04,977 DEBUG exec.StateMachine
[RunQueueThread_5,runScript:2987] running script submit
2007-10-02 10:05:04,998 DEBUG exec.JobManagerScript [Thread-18,run:208]
Executing command:
/usr/bin/sudo -H -u zimd0022 -S
/opt/globus-4.0.5/libexec/globus-gridmap-and-execute -g
/etc/grid-security/grid-mapfile /opt/globus-4.0.5/libexec/globus-jo
b-manager-script.pl -m fork -f
/opt/globus-4.0.5/tmp/gram_job_mgr52269.tmp -c submit
2007-10-02 10:05:05,160 DEBUG exec.JobManagerScript [Thread-18,run:225]
first line: null
2007-10-02 10:05:05,161 DEBUG exec.JobManagerScript [Thread-18,run:335]
failure message: Sudo is misconfigured to run the
globus-job-manager-script.pl scri
pt for user zimd0022.
2007-10-02 10:05:05,162 DEBUG exec.JobManagerScript
[Thread-18,setDone:345] script is done, setting done flag
2007-10-02 10:05:05,163 DEBUG exec.StateMachine
[RunQueueThread_5,processSubmitState:1168] Done waiting for submit script
2007-10-02 10:05:05,164 DEBUG exec.StateMachine
[RunQueueThread_5,processSubmitState:1176] script return code: 200
2007-10-02 10:05:05,165 DEBUG exec.StateMachine
[RunQueueThread_5,processSubmitState:1181] script return code means error!
2007-10-02 10:05:05,177 DEBUG exec.StateMachine
[RunQueueThread_5,createFaultFromErrorCode:3131] Creating fault from
error code 200
2007-10-02 10:05:05,177 WARN  exec.StateMachine
[RunQueueThread_5,createFaultFromErrorCode:3270] Unhandled fault code 200
2007-10-02 10:05:05,178 DEBUG exec.StateMachine
[RunQueueThread_5,createFaultFromErrorCode:3271] Offending Script
Command: submit
2007-10-02 10:05:05,184 DEBUG utils.FaultUtils
[RunQueueThread_5,createFault:422] Script Command: submit
2007-10-02 10:05:05,196 DEBUG utils.FaultUtils
[RunQueueThread_5,makeFault:460] Fault Class: class
org.globus.exec.generated.FaultType
2007-10-02 10:05:05,196 DEBUG utils.FaultUtils
[RunQueueThread_5,makeFault:461] Resource Key:
{http://www.globus.org/namespaces/2004/10/gram/job}ResourceID
=2e3cb5a0-70be-11dc-b1ea-b1743b772918
2007-10-02 10:05:05,196 DEBUG utils.FaultUtils
[RunQueueThread_5,makeFault:462] Description: Error code: 200
2007-10-02 10:05:05,197 DEBUG utils.FaultUtils
[RunQueueThread_5,makeFault:463] Cause: java.lang.Exception: Sudo is
misconfigured to run the globus-job-man
ager-script.pl script for user zimd0022.
2007-10-02 10:05:05,197 DEBUG utils.FaultUtils
[RunQueueThread_5,makeFault:464] State when failure occurred Unsubmitted
2007-10-02 10:05:05,197 DEBUG utils.FaultUtils
[RunQueueThread_5,makeFault:466] Script Command: submit
2007-10-02 10:05:05,198 DEBUG utils.FaultUtils
[RunQueueThread_5,makeFault:467] GT2 Error Code: 200
2007-10-02 10:05:05,225 DEBUG utils.FaultUtils
[RunQueueThread_5,makeFault:514] setting fault cause
2007-10-02 10:05:05,227 DEBUG utils.FaultUtils
[RunQueueThread_5,makeFault:519] Script Command: submit

*So, I took the sudo command and directly executed and gave the globus
password. *

[EMAIL PROTECTED]:~> /usr/bin/sudo -H -u zimd0022 -S
/opt/globus-4.0.5/libexec/globus-gridmap-and-execute -g
/etc/grid-security/grid-mapfile
/opt/globus-4.0.5/libexec/globus-job-manager-script.pl -m fork -f
/opt/globus-4.0.5/tmp/gram_job_mgr32745.tmp -c submit
Password:
Sorry, try again.
Password:
Sorry, try again.
Password:
Sorry, try again.
/usr/bin/sudo: 3 incorrect password attempts
[EMAIL PROTECTED]:~>
[EMAIL PROTECTED]:~> ls -altrh /opt/
total 24K
...........
drwxr-xr-x  16 globus   globus 4.0K 2007-09-27 15:38 globus-4.0.5
drwxr-xr-x  16 globus   globus 4.0K 2007-10-01 11:47 globus-4.0.2
lrwxrwxrwx   1 root     root     12 2007-10-01 11:59 globus -> globus-4.0.5
drwxr-xr-x  17 root     root   4.0K 2007-10-01 11:59 .
[EMAIL PROTECTED]:~>  
                           
*Now on hector: *

[EMAIL PROTECTED]:~> globusrun-ws -submit -S -F
https://hector.zih.tu-dresden.de:8443/wsrf/services/ManagedJobFactoryService
-s -c /bin/date
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:a9a0a49a-70be-11dc-bf1f-080069149999
Termination time: 10/03/2007 08:08 GMT
Current job state: Failed
Destroying job...Done.
Cleaning up any delegated credentials...Done.
globusrun-ws: Job failed: Error code: 201
Script stderr:
zimd0022's password:

[EMAIL PROTECTED]:~>

*The entries in container log: *

2007-10-02 10:08:32,890 DEBUG exec.StateMachine
[RunQueueThread_13,runScript:2987] running script submit
2007-10-02 10:08:32,890 DEBUG exec.JobManagerScript [Thread-18,run:208]
Executing command:
/usr/bin/sudo -H -u zimd0022 -S
/opt/globus-4.0.5/libexec/globus-gridmap-and-execute -g
/etc/grid-security/grid-mapfile /opt/globus-4.0.5/libexec/globus-jo
b-manager-script.pl -m fork -f
/opt/globus-4.0.5/tmp/gram_job_mgr63230.tmp -c submit
2007-10-02 10:08:32,983 DEBUG exec.JobManagerScript [Thread-18,run:225]
first line: null
2007-10-02 10:08:32,984 DEBUG exec.JobManagerScript [Thread-18,run:335]
failure message: Script stderr:
zimd0022's password:
2007-10-02 10:08:32,984 DEBUG exec.JobManagerScript
[Thread-18,setDone:345] script is done, setting done flag
2007-10-02 10:08:32,985 DEBUG exec.StateMachine
[RunQueueThread_13,processSubmitState:1168] Done waiting for submit script
2007-10-02 10:08:32,986 DEBUG exec.StateMachine
[RunQueueThread_13,processSubmitState:1176] script return code: 201
2007-10-02 10:08:32,986 DEBUG exec.StateMachine
[RunQueueThread_13,processSubmitState:1181] script return code means error!
2007-10-02 10:08:32,986 DEBUG exec.StateMachine
[RunQueueThread_13,createFaultFromErrorCode:3131] Creating fault from
error code 201
2007-10-02 10:08:32,986 WARN  exec.StateMachine
[RunQueueThread_13,createFaultFromErrorCode:3270] Unhandled fault code 201
2007-10-02 10:08:32,987 DEBUG exec.StateMachine
[RunQueueThread_13,createFaultFromErrorCode:3271] Offending Script
Command: submit
2007-10-02 10:08:32,991 DEBUG utils.FaultUtils
[RunQueueThread_13,createFault:422] Script Command: submit
2007-10-02 10:08:32,997 DEBUG utils.FaultUtils
[RunQueueThread_13,makeFault:460] Fault Class: class
org.globus.exec.generated.FaultType
2007-10-02 10:08:32,998 DEBUG utils.FaultUtils
[RunQueueThread_13,makeFault:461] Resource Key:
{http://www.globus.org/namespaces/2004/10/gram/job}ResourceI
D=aaadb440-70be-11dc-bd72-842811339d49
2007-10-02 10:08:32,998 DEBUG utils.FaultUtils
[RunQueueThread_13,makeFault:462] Description: Error code: 201
2007-10-02 10:08:32,998 DEBUG utils.FaultUtils
[RunQueueThread_13,makeFault:463] Cause: java.lang.Exception: Script stderr:
zimd0022's password:
2007-10-02 10:08:32,998 DEBUG utils.FaultUtils
[RunQueueThread_13,makeFault:464] State when failure occurred Unsubmitted
2007-10-02 10:08:32,998 DEBUG utils.FaultUtils
[RunQueueThread_13,makeFault:466] Script Command: submit
2007-10-02 10:08:32,999 DEBUG utils.FaultUtils
[RunQueueThread_13,makeFault:467] GT2 Error Code: 201
2007-10-02 10:08:33,006 DEBUG utils.FaultUtils
[RunQueueThread_13,makeFault:514] setting fault cause
2007-10-02 10:08:33,007 DEBUG utils.FaultUtils
[RunQueueThread_13,makeFault:519] Script Command: submit

*Result of sudo command direct execution. *

[EMAIL PROTECTED]:~> /usr/bin/sudo -H -u zimd0022 -S
/opt/globus-4.0.5/libexec/globus-gridmap-and-execute -g
/etc/grid-security/grid-mapfile
/opt/globus-4.0.5/libexec/globus-job-manager-script.pl -m fork -f
/opt/globus-4.0.5/tmp/gram_job_mgr63230.tmp -c submit
Password:
Sorry, try again.
Password:
Sorry, try again.
Password:
Sorry, try again.
/usr/bin/sudo: 3 incorrect password attempts
[EMAIL PROTECTED]:~>

[EMAIL PROTECTED]:~> ls -altrh /opt/
total 16K
...............
drwxr-xr-x 16 globus   globus   4.0K 2007-09-20 01:43 globus-4.0.4
lrwxrwxrwx  1 root     root       12 2007-10-02 09:49 globus -> globus-4.0.5
drwxr-xr-x 16 globus   globus   4.0K 2007-10-02 09:50 globus-4.0.5
[EMAIL PROTECTED]:~>

So, I set the globus to again to globus-4.0.4.

hector:/opt # rm globus
hector:/opt # ln -s globus-4.0.4 globus
hector:/opt #  ls -altrh
total 16K
.......
drwxr-xr-x 16 globus   globus   4.0K 2007-09-20 01:43 globus-4.0.4
drwxr-xr-x 16 globus   globus   4.0K 2007-10-02 09:50 globus-4.0.5
lrwxrwxrwx  1 root     root       12 2007-10-02 10:15 globus -> globus-4.0.4
hector:/opt #

*At client: *

[EMAIL PROTECTED]:~> globusrun-ws -submit -S -F
https://hector.zih.tu-dresden.de:8443/wsrf/services/ManagedJobFactoryService
-s -c /bin/date
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:d39adff8-70bf-11dc-a4b2-080069149999
Termination time: 10/03/2007 08:16 GMT
Current job state: Active
Current job state: CleanUp-Hold
Tue Oct  2 10:16:52 CEST 2007
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.
[EMAIL PROTECTED]:~>

*Container log:

*2007-10-02 10:16:52,525 DEBUG exec.StateMachine
[RunQueueThread_0,runScript:2883] running script submit
2007-10-02 10:16:52,525 DEBUG exec.JobManagerScript [Thread-14,run:208]
Executing command:
/usr/bin/sudo -H -u zimd0022 -S
/opt/globus/libexec/globus-gridmap-and-execute -g
/etc/grid-security/grid-mapfile /opt/globus/libexec/globus-job-manager-sc
ript.pl -m fork -f /opt/globus/tmp/gram_job_mgr18847.tmp -c submit
2007-10-02 10:16:52,686 DEBUG exec.JobManagerScript [Thread-14,run:225]
first line: GRAM_SCRIPT_JOB_ID:d4e0e07e-70bf-11dc-aba6-00151714069c:1045
2007-10-02 10:16:52,686 DEBUG exec.JobManagerScript [Thread-14,run:228]
Read line: GRAM_SCRIPT_JOB_ID:d4e0e07e-70bf-11dc-aba6-00151714069c:1045
2007-10-02 10:16:52,686 DEBUG exec.JobManagerScript [Thread-14,run:240]
Received local job ID d4e0e07e-70bf-11dc-aba6-00151714069c:1045
2007-10-02 10:16:52,686 DEBUG exec.JobManagerScript [Thread-14,run:228]
Read line: GRAM_SCRIPT_JOB_STATE:2
2007-10-02 10:16:52,687 DEBUG exec.JobManagerScript [Thread-14,run:335]
failure message: null
2007-10-02 10:16:52,688 DEBUG exec.JobManagerScript
[Thread-14,setDone:345] script is done, setting done flag
2007-10-02 10:16:52,688 DEBUG exec.StateMachine
[RunQueueThread_0,processSubmitState:1105] Done waiting for submit script
2007-10-02 10:16:52,688 DEBUG exec.StateMachine
[RunQueueThread_0,processSubmitState:1129] script return code: 0
2007-10-02 10:16:52,689 DEBUG exec.StateMachine
[RunQueueThread_0,processSubmitState:1161] script returned job state: Active
2007-10-02 10:16:52,690 DEBUG
ManagedJobResourceImpl.d39adff8-70bf-11dc-a4b2-080069149999
[RunQueueThread_0,getResourceDatum:217] getting resource datum localJobId
2007-10-02 10:16:52,690 DEBUG
ManagedJobResourceImpl.d39adff8-70bf-11dc-a4b2-080069149999
[RunQueueThread_0,getResourceDatum:223] Obtaining lock on resourceData
2007-10-02 10:16:52,690 DEBUG
ManagedJobResourceImpl.d39adff8-70bf-11dc-a4b2-080069149999
[RunQueueThread_0,getResourceDatum:226] Obtained lock on resourceData
2007-10-02 10:16:52,690 DEBUG
ManagedJobResourceImpl.d39adff8-70bf-11dc-a4b2-080069149999
[RunQueueThread_0,getResourceDatum:266] Releasing lock on resourceData

*I see the only difference is that in case of GT4.0.4 the symlink is not
resolved but sudo is working perfectly as you can see. Sorry for the
long mail. But now any tips for me.

Cheers,
Samatha

*

-- 
Samatha Kottha
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
Technische Universität Dresden                  Tel: (+49) 351 463-38776
Room 1019                                       Fax: (+49) 351 463-38245
Noethnitzer Straße 46 
01187 Dresden
Germany 

Reply via email to