It's a DNS/reverse-DNS problem. Your client is taking the address you're specifying, converting it to an IP, then converting it back to a name via reverse DNS. For whatever reason, it's not getting the name back that is in the CN= part of the host certificate. If you were on a newer version than 4.0.1, you probably would have gotten an error message that explained that more clearly. If I had to guess, I'd say you were using or getting localhost in one part of the translation, and that's what did it. So it might work from other machines even if it doesn't work to itself.

Charles

On Mar 12, 2009, at 3:59 PM, Vladimir Janjic wrote:



On Thu, Mar 12, 2009 at 8:51 PM, Charles Bacon <[email protected]> wrote:
Okay.  What happens if the client submits:
globus-job-run ardbeg.cs.st-andrews.ac.uk::/O=Grid/OU=SCIEnce/ CN=host/ardbeg.cs.st-andrews.ac.uk

wow, this actually worked! yipe!!!!




If you have a recent openssl installation (that is., not the one from 4.0.1) try this command:
openssl x509 -in <hostcert> -noout -issuer_hash
openssl x509 -in <usercert> -noout -issuer_hash

[...@ardbeg .globus]$ openssl x509 -in usercert.pem -noout -issuer_hash
e03d7d8e
[...@ardbeg .globus]$ openssl x509 -in /etc/grid-security/ hostcert.pem -noout -issuer_hash
e03d7d8e


If your openssl doesn't support issuer_hash, just use issuer. That might help us distinguish whether the signing CAs are really the same or not.


ok, i could live with running jobs always with specifying ::/O=Grid/ OU=SCIEnce/CN=host/ardbeg.cs.st-andrews.ac.uk (hopefully, this would work from .rsl file), but i am still curious why i have to enter this, because on other machines that use the same setup, i don't have to do that.

thanks a lot, charles!




Charles


On Mar 12, 2009, at 3:46 PM, Vladimir Janjic wrote:

yes, user's certificate is signed by the same CA.
this is the output of grid-proxy-init -debug -verify

[...@ardbeg ~]$ grid-proxy-init -verify -debug

User Cert File: /home/vj/.globus/usercert.pem
User Key File: /home/vj/.globus/userkey.pem

Trusted CA Cert Dir: /etc/grid-security/certificates

Output File: /tmp/x509up_u516
Your identity: /O=Grid/OU=SCIEnce/CN=Vladimir Janjic
Enter GRID pass phrase for this identity:
Creating proxy ......++++++++++++
......++++++++++++
 Done
Proxy Verify OK
Your proxy is valid until: Fri Mar 13 08:40:51 2009


here are the lines from usercert.pem and hostcert.pem, if that helps

usercert.pem :
Certificate:
   Data:
       Version: 3 (0x2)
       Serial Number: 98 (0x62)
       Signature Algorithm: md5WithRSAEncryption
       Issuer: O=Grid, OU=SCIEnce, CN=IeAT CA
       Validity
           Not Before: Jun  9 14:40:05 2008 GMT
           Not After : Jun  9 14:40:05 2009 GMT
...


hostcert.pem :
Certificate:
   Data:
       Version: 3 (0x2)
       Serial Number: 131 (0x83)
       Signature Algorithm: md5WithRSAEncryption
       Issuer: O=Grid, OU=SCIEnce, CN=IeAT CA
       Validity
           Not Before: Mar  9 13:34:31 2009 GMT
           Not After : Mar  9 13:34:31 2010 GMT
       Subject: O=Grid, OU=SCIEnce, CN=host/ardbeg.cs.st-andrews.ac.uk


vladimir

On Thu, Mar 12, 2009 at 8:34 PM, Charles Bacon <[email protected]> wrote: Could still be a permissions issue in /etc/grid-security/ certificates. Is the user's certificate signed by the same CA? What is the output of grid-proxy-init -debug -verify?

-c


On Mar 12, 2009, at 2:20 PM, Vladimir Janjic wrote:

Thanks very much for the answer, Charles!

But, unfortunately this didn't fix the error. the entry for gatekeeper is

service gsigatekeeper
{
 socket_type = stream
 protocol = tcp
 wait = no
 user = root
 env = LD_LIBRARY_PATH=/usr/local/globus-4.0.1/lib
 server = /usr/local/globus-4.0.1/sbin/globus-gatekeeper
server_args = -conf /usr/local/globus-4.0.1/etc/globus- gatekeeper.conf
 disable = no
}

so, it doesn't set any X509_* variable in 'env = ...'. issuer of / etc/grid-security/hoscert.pem is the CA that exists in /etc/grid- security/certificates (actually, it is the same issuer as for another two machines on which everything works, and /etc/grid- security/certificates is the same on all three machines). i don't have X509_* set in my environment, and i only have certificate in mu .globus directory. i assume that error must be somewhere in my local setup, but i cannot find out where

thanks a lot,
vladimir

On Thu, Mar 12, 2009 at 6:44 PM, Charles Bacon <[email protected]> wrote: Your local client does not trust the gatekeeper, guaranteed. The "delegation protocol violation" the gatekeeper is reporting is that the client is disconnecting before performing delegation. The only reason the client would disconnect like that is because it failed to authorize the gatekeeper's certificate.

Double-check your xinetd entry for the gatekeeper to make sure no X509_* environment variables are being set. Then check the issuer of your /etc/grid-security/hostcert.pem. Then check that that CA exists in the /etc/grid-security/certificates directory. Then double-check that your client environment doesn't have any X509_* variables set. Then make sure you don't have a $HOME/.globus/ certificates directory.

One of those diagnostic steps should reveal where the problem is.


Charles


On Mar 12, 2009, at 1:31 PM, Vladimir Janjic wrote:

Hi all,

I am having a problem with Globus 4.0.1 and I don't have any idea what is causing it and how can I solve it.

The problem is I cannot submit any job to the Gatekeeper, because I get

GRAM Job submission failed because an authorization operation failed (error code 7)

error. The globus-gatekeeper.log file gives the following error when i try to run, for example,
globus-job-run ardbeg.cs.st-andrews.ac.uk /bin/date :

TIME: Thu Mar 12 18:16:52 2009
PID: 28192 -- Notice: 6: Got connection 138.251.214.66 at Thu Mar 12 18:16:52 2009

GSS authentication failure
GSS Major Status: General failure
GSS Minor Status Error Chain:
globus_gsi_gssapi: Error during delegation: Delegation protocol violation
Failure: GSS failed Major:000d0000 Minor:00000001 Token:00000000

TIME: Thu Mar 12 18:16:52 2009
PID: 28192 -- Failure: GSS failed Major:000d0000 Minor:00000001 Token:00000000

I am submitting the job to the gatekeeper which is on the same machine. I have read somewhere that the problem might be that my certificate doesn't trust the host's certificate, and that it is disconnecting from gatekeeper
immediately.
But, I can easily run jobs on gatekeeper locally on one other cluster (wnxxx.grid.info.uvt.ro), using the same user certificate as on ardbeg.cs.st-andrews.ac.uk. Also, the hostcert.pem on the wnxxx.grid.info.uvt.ro cluster is signed by the same CA as hostcert.pem on ardbeg.cs.st-andrews.ac.uk machine, and files in /etc/grid-security/certificates are
the same on both machines.

I am desperate, because I need to run some tests on this machine, but I cannot because of these problems.

Please help!!!!!!!!

Vladimir









Reply via email to