can you send the gridftp server logs as well?

Adam Bazinet wrote:
Hi,

I'm getting a strange error with every job I submit to Condor through my GT
4.2.0 installation.  Job submits and runs fine, but fails during the
fileCleanUp stage.  Here's one look at the error:

[EMAIL PROTECTED]:/export/grid_files/171727100.24523977945405806> globusrun-ws
-status -j jobEPR.txt
Current job state: Failed
globusrun-ws: Job failed: Staging error for RSL element fileCleanUp.
Connection creation error [Caused by: java.io.EOFException]

The relevant snippet from the job description is here:

<fileCleanUp>
<deletion>
<file>file:///${GLOBUS_SCRATCH_DIR}/171727100.24523977945405806/</file>
</deletion>
</fileCleanUp>

I can assure you there is nothing special about the directory in question.
In fact, submissions to our custom BOINC job manager (with the same
fileCleanUp block) in the same container work just fine.  In fact, we have
another identical 4.2.0 installation on another host that submits to Condor
just fine.  However, I can't seem to get it to work in this container.  One
difference is that this is a RHEL5 host, and the other host I just mentioned
is running RHEL4.

I turned on RFT debugging and I can narrow down the error to this attempt:

2008-09-24T13:47:12.125-04:00 ERROR cache.ConnectionManager
[Thread-32,createNewConnection:345] Can't create connection:
java.io.EOFException
2008-09-24T13:47:12.127-04:00 ERROR service.TransferWork [Thread-32,run:413]
Transient transfer error
Connection creation error [Caused by: java.io.EOFException]
Connection creation error. Caused by java.io.EOFException
    at org.globus.ftp.vanilla.Reply.<init>(Reply.java:78)
    at
org.globus.ftp.vanilla.FTPControlChannel.read(FTPControlChannel.java:342)
    at
org.globus.ftp.vanilla.FTPControlChannel.readInitialReplies(FTPControlChannel.java:225)
    at
org.globus.ftp.vanilla.FTPControlChannel.open(FTPControlChannel.java:214)
    at org.globus.ftp.GridFTPClient.<init>(GridFTPClient.java:74)>
    at
org.globus.transfer.reliable.service.cache.SingleConnectionImpl.<init>(SingleConnectionImpl.java:66)
    at
org.globus.transfer.reliable.service.cache.ConnectionManager.createNewConnection(ConnectionManager.java:327)
    at
org.globus.transfer.reliable.service.cache.ConnectionManager.getConnection(ConnectionManager.java:190)
    at
org.globus.transfer.reliable.service.cache.ConnectionManager.getConnection(ConnectionManager.java:127)
    at
org.globus.transfer.reliable.service.client.DeleteClient.<init>(DeleteClient.java:43)
    at
org.globus.transfer.reliable.service.client.ClientFactory.createDeleteClient(ClientFactory.java:61)
    at
org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:347)
    at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown
Source)
    at java.lang.Thread.run(Thread.java:595)
2008-09-24T13:47:12.136-04:00 DEBUG service.TransferWork
[Thread-32,setFault:219] setting transient fault
2008-09-24T13:47:12.136-04:00 DEBUG service.TransferWork
[Thread-32,processStates:246] [Request 62, Transfer 250] processing state
for transfer of gsiftp://
lysine.umiacs.umd.edu:2811/fs/mikehomes/gt4admin/.globus/scratch/171727100.24523977945405806/
->  null

I guess a transfer to 'null' in RFT really means delete the directory.
However, it is consistently failing with this strange EOFException.  To me
the fact that it only occurs when submitting to Condor is really strange;
I've already reinstalled the entire gt4-gram-condor unit but there was no
change.  I'll attach the container log with RFT/GRAM debug turned on.

thanks,
Adam

Reply via email to