I made the change using vdt's vdt-local-setup.sh
which I know doesn't get modified, and now the epr shows the right
ip in it, and the example you gave works.
but my initial example still doesn't.

bash-3.00$ globusrun-ws -submit -batch -o foo.epr -F fnpcosg1.fnal.gov:9443 -FtCondor -c /usr/bin/id
Submitting job...Done.
Job ID: uuid:decb6502-438f-11dd-9611-001422086c92
Termination time: 06/27/2008 14:55 GMT
bash-3.00$ more foo.epr
<ns00:EndpointReferenceType xmlns:ns00="http://schemas.xmlsoap.org/ws/2004/03/ad
dressing"><ns00:Address>https://131.225.166.2:9443/wsrf/services/ManagedExecutab
leJobService</ns00:Address><ns00:ReferenceProperties><ResourceID xmlns="http://w
ww.globus.org/namespaces/2004/10/gram/job">df3b1b40-438f-11dd-88db-cf7a593808fb<
/ResourceID></ns00:ReferenceProperties><wsa:ReferenceParameters xmlns:wsa="http:
//schemas.xmlsoap.org/ws/2004/03/addressing"/></ns00:EndpointReferenceType>
bash-3.00$ globusrun-ws -monitor -j foo.epr -F fnpcosg1.fnal.gov:9443 -Ft Condor
Current job state: Done
Requesting original job description...Done.
Destroying job...Done.
bash-3.00$ globusrun-ws -submit -F fnpcosg1.fnal.gov:9443 -Ft Condor -J -s -c /usr/bin/id
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:fa78cea2-438f-11dd-a905-001422086c92
Termination time: 06/27/2008 14:55 GMT
globusrun-ws: globus_service_engine.c:globus_l_service_engine_session_started_callback:2744:
Session failed to start
globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1335:
The peer authenticated as /DC=org/DC=doegrids/OU=Services/CN=fnpcosg1.fnal.gov.Expected the peer to authenticate as /CN=host/fnpc3x1.fnal.gov bash-3.00$ globusrun-ws -submit -F fnpcosg1.fnal.gov:9443 -Ft Condor -J -c /usr/bin/id
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:355cc8b6-4390-11dd-a249-001422086c92
Termination time: 06/27/2008 14:57 GMT
globusrun-ws: globus_service_engine.c:globus_l_service_engine_session_started_callback:2744:
Session failed to start
globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1335:
The peer authenticated as /DC=org/DC=doegrids/OU=Services/CN=fnpcosg1.fnal.gov.Expected the peer to authenticate as /CN=host/fnpc3x1.fnal.gov bash-3.00$ globusrun-ws -submit -F fnpcosg1.fnal.gov:9443 -Ft Condor -s -c /usr/bin/id
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:3a4ee764-4390-11dd-bb28-001422086c92
Termination time: 06/27/2008 14:57 GMT
globusrun-ws: globus_service_engine.c:globus_l_service_engine_session_started_callback:2744:
Session failed to start
globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1335:
The peer authenticated as /DC=org/DC=doegrids/OU=Services/CN=fnpcosg1.fnal.gov.Expected the peer to authenticate as /CN=host/fnpc3x1.fnal.gov

Any idea what else we might have to fix?

Steve Timm


------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
[EMAIL PROTECTED]  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

On Thu, 26 Jun 2008, Charles Bacon wrote:

On Jun 26, 2008, at 9:09 AM, Steven Timm wrote:

On Thu, 26 Jun 2008, Charles Bacon wrote:

As an experiment, can you tell me what happens if you run the job in two parts:
First, try -submit -batch -o foo.epr
Check what hostname/IP shows up in the EPR as the endpoint of the service.
<ns00:EndpointReferenceType xmlns:ns00="http://schemas.xmlsoap.org/ws/2004/03/ad
dressing"><ns00:Address>https://131.225.167.18:9443/wsrf/services/ManagedExecuta
bleJobService</ns00:Address><ns00:ReferenceProperties><ResourceID xmlns="http://
www.globus.org/namespaces/2004/10/gram/job">da7e0c90-4388-11dd-96e1-d1739b31397d
</ResourceID></ns00:ReferenceProperties><wsa:ReferenceParameters xmlns:wsa="http
://schemas.xmlsoap.org/ws/2004/03/addressing"/></ns00:EndpointReferenceType>

that's the wrong IP, it should be the other one.

Okay. So, that's going to be the difference between globus-job-run and globusrun-ws. The globusrun-ws client is getting back an address from the container that it will use to get further updates. The (submit/batch) part of the job is using the address you hand-supplied on the commandline, so it's working. The (monitor) part of the client is failing because the service is returning a bad address.

The fix is to get the container to bind to the right address, which you can do with GLOBUS_HOSTNAME.

as far as I can tell, GLOBUS_HOSTNAME is not set in the environment
of the container.  What's the best way to set it in a VDT environment?
I did set GLOBUS_HOSTNAME before I installed the VDT, to fnpcosg1.
I am now running the container in full-out debug mode so if there
are any logs you need to see, let me know.

It's starting globus-start-container out of /etc/init.d/globus-ws. It looks like it sources both setup.sh and vdt/etc/globus-options.sh. globus-options.sh looks like it is intended to setup the JVM options used by the container. If I were going to set GLOBUS_HOSTNAME, based on what I've seen I'd put it in the init.d script, or the globus-options.sh file. I'm not sure if those two are vulnerable to being overwritten during a pacman update or by a vdt-control on/off.

The other place you can fix it that's not VDT-specific is under $GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd. The options are described at http://www.globus.org/toolkit/docs/4.0/common/javawscore/admin-index.html#id2531913. Basically, adding a "<parameter name="logicalHost" value="the.right.ip.address"> to the globalConfiguration section is equivalent to setting your GLOBUS_HOSTNAME to that IP address.


Charles

Reply via email to