Hi all,

I'm  doing some experiments
with globus toolkit + PBS.

My experimental topology is composed by one node (vm-02) running pbs_mom, one
another running globus container (vm-01) and pbs scheduler,
and the last (vm-03) with a globus installation but only for client-side,
i.e.,  command  globusrun-ws (and the gridftp server).

The RSL contains also directives for StageIn and StageOut. The GridFTP server
is ok both on Vm-01 and vm-03 and the home directory of the user
running jobs is shared by nfs between vm-01 and vm-02.

So if I launch the job by the command

globusrun-ws -submit -S -Ft PBS -F https://vm-01/wsrf/services/ManagedJobFactoryService
-f job.rsl I obtain a normal execution with no error.

If I submit systematically the above command for parallel requests for more than ten jobs I obtain errors like follow. It's likely an error of the gridftp server. Anyway,
in my configuration there is no limit to the number of gridftp connections.

Any help?

Thank you

Fabrizio.

---
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:8fb52422-0cc4-11de-8080-525400123403
Termination time: 03/09/3009 16:10 GMT
Current job state: StageIn
Current job state: Pending
Current job state: Active
Current job state: StageOut
Current job state: CleanUp
Current job state: Failed
Destroying job...Done.
Cleaning up any delegated credentials...Done.
globusrun-ws: Job failed: Staging error for RSL element fileCleanUp. Authentication with credential only failed on server vm-01 [Caused by: Connection reset] <--
---

and

----
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:893ce4b8-0cc9-11de-aabf-525400123403
Termination time: 03/09/3009 16:47 GMT
Current job state: StageIn
Current job state: Pending
Current job state: Active
Current job state: StageOut
Current job state: Failed
Destroying job...Done.
Cleaning up any delegated credentials...Done.
globusrun-ws: Job failed: Staging error for RSL element fileStageOut. Authentication with credential only failed on server vm-01 [Caused by: java.io.EOFException] <--
------

and finally a snapshot of the log of the (vm-01, which receive requests
and dispatch to the PBS scheduler) container:

---
2009-03-09T18:17:11.214+01:00 ERROR service.TransferWork [Thread-15,oldLog:175] Transient transfer error Authentication with credential only failed on server vm-01 [Caused by: Connection reset] Authentication with credential only failed on server vm-01. Caused by java.net.SocketException: Connection reset
----



--
========================================================================
mailto:[email protected]     Fabrizio Messina
Tel: +39 095 7383000 (direct)   Dipartimento di Matematica e Informatica
Mobile: +39 345.70.991.70       Universita' di Catania
Fax: +39 095 330094 Viale A. Doria 6, 95125 Catania, Italy PGP Key Available @ 0xF5BDE774

Reply via email to