Hi,
I was having a problem submitting jobs with the GT5 native globusrun
command. When I submitted a job, the job state would never progress
(irrespective of whether I use SEG or not).
When I tried doing the same thing with the globusrun command from old
GT2 (Globus 4.0.8/VDT 1.10, globusrun "4.7") - against a GT5.0.0 server,
everything worked fine:
$ globusrun -r ng1 '&(executable=/bin/hostname)'
> globus_gram_client_callback_allow successful
> GRAM Job submission successful
> GLOBUS_GRAM_PROTOCOL_JOB_STATE_PENDING
> GLOBUS_GRAM_PROTOCOL_JOB_STATE_ACTIVE
> GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE
$ globusrun -o -r ng1 '&(executable=/bin/hostname)'
> ngcompute.canterbury.ac.nz
I was comparing GRAM logs between the two runs and could not find a
difference.
Finally, I tried using an RFC3820 proxy certificate, and the job
submission started to work. Before, I was fetching a proxy certificate
from MyProxy, and it was a "full legacy globus proxy".
Was something changed in how GRAM5 supports older styles of proxy
certificates ? The other proxies are still good for authenticating to
GridFTP and Gatekeeper, but somehow, they broke when job manager was
trying to send job status messages back to globusrun. ... this is my
hypothesis, based on this snippet from Globus documentation:
In pre-WS GRAM, the GRAM client is required to delegate a proxy
credential to the Gatekeeper so that the job manager can send
authenticated job state change messages.
One thing which might be related (so I'm including it in the same
email): I've tried to use the "-j" globusrun flag to query the version.
Even with the RFC proxy, this fails with:
$globusrun -r ng1 -j
>
> GRAM version check failed : an end-of-file was reached
> globus_xio: An end of file occurred
and I see the following in $GLOBUS_LOCATION/var/globus-gatekeeper.log
> PID: 11385 -- Notice: 0: executing /opt/globus/libexec/globus-job-manager
> TIME: Mon Mar 15 15:16:56 2010
> PID: 11385 -- Notice: 0: GRID_SECURITY_CONTEXT_FD=11
> ts=2010-03-15T02:16:56Z id=11386 event=gram_gsi_get_subject.start level=TRACE
> TIME: Mon Mar 15 15:16:56 2010
> PID: 11385 -- Notice: 0: Child 11386 started
> ts=2010-03-15T02:16:56Z id=11386 event=gram_gsi_get_subject.end level=ERROR
> status=-29 \
> reason="Error getting subject\nGSS Major Status: Problem with local
> credentials\n \
> GSS Minor Status Error Chain:\nglobus_gsi_gssapi: Error with GSI credential\n
> \
> globus_gsi_gssapi: Error with gss credential handle\n
> globus_credential: Valid credentials could not be found in any of the
> possible locations specified by the credential search order.\n
> Valid credentials could not be found in any of the possible locations
> specified by the credential search order.\n
> Attempt 1\nglobus_credential: Error reading host
> credential\nglobus_sysconfig: Error with certificate filename\n
> globus_sysconfig: Error with certificate filename\n
> globus_sysconfig: File is not owned by current user:
> /etc/grid-security/hostcert.pem is not owned by current user\n
> Attempt 2\nglobus_credential: Error reading proxy credential\n
> globus_sysconfig: Could not find a valid proxy certificate file
> location\nglobus_sysconfig: Error with key filename\n
> globus_sysconfig: File does not exist: /tmp/x509up_u95008 is not a valid
> file\n
> Attempt 3\nglobus_credential: Error reading user credential\n
> globus_sysconfig: Error with certificate filename: The user cert could not be
> found in: \n
> 1) env. var. X509_USER_CERT\n2) $HOME/.globus/usercert.pem\n3)
> $HOME/.globus/usercred.p12\n\n\n"
Any idea what's wrong?
Cheers,
Vladimir
--
Vladimir Mencl, Ph.D.
E-Research Services and Systems Consultant
BlueFern Supercomputing Services
University of Canterbury
Private Bag 4800
Christchurch 8140
New Zealand
http://www.bluefern.canterbury.ac.nz
mailto:[email protected]
Phone: +64 3 364 3012
Mobile: +64 21 997 352
Fax: +64 3 364 2332