Hi all,
I'm doing some experiments
with globus toolkit + PBS.
My experimental topology is composed by one node (vm-02) running
pbs_mom, one
another running globus container (vm-01) and pbs scheduler,
and the last (vm-03) with a globus installation but only for client-side,
i.e., command globusrun-ws (and the gridftp server).
The RSL contains also directives for StageIn and StageOut. The GridFTP
server
is ok both on Vm-01 and vm-03 and the home directory of the user
running jobs is shared by nfs between vm-01 and vm-02.
So if I launch the job by the command
globusrun-ws -submit -S -Ft PBS -F
https://vm-01/wsrf/services/ManagedJobFactoryService
-f job.rsl I obtain a normal execution with no error.
If I submit systematically the above command for parallel requests for
more than ten jobs
I obtain errors like follow. It's likely an error of the gridftp server.
Anyway,
in my configuration there is no limit to the number of gridftp connections.
Any help?
Thank you
Fabrizio.
---
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:8fb52422-0cc4-11de-8080-525400123403
Termination time: 03/09/3009 16:10 GMT
Current job state: StageIn
Current job state: Pending
Current job state: Active
Current job state: StageOut
Current job state: CleanUp
Current job state: Failed
Destroying job...Done.
Cleaning up any delegated credentials...Done.
globusrun-ws: Job failed: Staging error for RSL element fileCleanUp.
Authentication with credential only failed on server vm-01 [Caused by:
Connection reset] <--
---
and
----
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:893ce4b8-0cc9-11de-aabf-525400123403
Termination time: 03/09/3009 16:47 GMT
Current job state: StageIn
Current job state: Pending
Current job state: Active
Current job state: StageOut
Current job state: Failed
Destroying job...Done.
Cleaning up any delegated credentials...Done.
globusrun-ws: Job failed: Staging error for RSL element fileStageOut.
Authentication with credential only failed on server vm-01 [Caused by:
java.io.EOFException] <--
------
and finally a snapshot of the log of the (vm-01, which receive requests
and dispatch to the PBS scheduler) container:
---
2009-03-09T18:17:11.214+01:00 ERROR service.TransferWork
[Thread-15,oldLog:175] Transient transfer error
Authentication with credential only failed on server vm-01 [Caused by:
Connection reset]
Authentication with credential only failed on server vm-01. Caused by
java.net.SocketException: Connection reset
----
--
========================================================================
mailto:[email protected] Fabrizio Messina
Tel: +39 095 7383000 (direct) Dipartimento di Matematica e Informatica
Mobile: +39 345.70.991.70 Universita' di Catania
Fax: +39 095 330094 Viale A. Doria 6, 95125 Catania, Italy
PGP Key Available @ 0xF5BDE774