can you send the gridftp server logs as well? Adam Bazinet wrote:
Hi,I'm getting a strange error with every job I submit to Condor through my GT 4.2.0 installation. Job submits and runs fine, but fails during the fileCleanUp stage. Here's one look at the error: [EMAIL PROTECTED]:/export/grid_files/171727100.24523977945405806> globusrun-ws -status -j jobEPR.txt Current job state: Failed globusrun-ws: Job failed: Staging error for RSL element fileCleanUp. Connection creation error [Caused by: java.io.EOFException] The relevant snippet from the job description is here: <fileCleanUp> <deletion> <file>file:///${GLOBUS_SCRATCH_DIR}/171727100.24523977945405806/</file> </deletion> </fileCleanUp> I can assure you there is nothing special about the directory in question. In fact, submissions to our custom BOINC job manager (with the same fileCleanUp block) in the same container work just fine. In fact, we have another identical 4.2.0 installation on another host that submits to Condor just fine. However, I can't seem to get it to work in this container. One difference is that this is a RHEL5 host, and the other host I just mentioned is running RHEL4. I turned on RFT debugging and I can narrow down the error to this attempt: 2008-09-24T13:47:12.125-04:00 ERROR cache.ConnectionManager [Thread-32,createNewConnection:345] Can't create connection: java.io.EOFException 2008-09-24T13:47:12.127-04:00 ERROR service.TransferWork [Thread-32,run:413] Transient transfer error Connection creation error [Caused by: java.io.EOFException] Connection creation error. Caused by java.io.EOFException at org.globus.ftp.vanilla.Reply.<init>(Reply.java:78) at org.globus.ftp.vanilla.FTPControlChannel.read(FTPControlChannel.java:342) at org.globus.ftp.vanilla.FTPControlChannel.readInitialReplies(FTPControlChannel.java:225) at org.globus.ftp.vanilla.FTPControlChannel.open(FTPControlChannel.java:214) at org.globus.ftp.GridFTPClient.<init>(GridFTPClient.java:74)> at org.globus.transfer.reliable.service.cache.SingleConnectionImpl.<init>(SingleConnectionImpl.java:66) at org.globus.transfer.reliable.service.cache.ConnectionManager.createNewConnection(ConnectionManager.java:327) at org.globus.transfer.reliable.service.cache.ConnectionManager.getConnection(ConnectionManager.java:190) at org.globus.transfer.reliable.service.cache.ConnectionManager.getConnection(ConnectionManager.java:127) at org.globus.transfer.reliable.service.client.DeleteClient.<init>(DeleteClient.java:43) at org.globus.transfer.reliable.service.client.ClientFactory.createDeleteClient(ClientFactory.java:61) at org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:347) at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Thread.java:595) 2008-09-24T13:47:12.136-04:00 DEBUG service.TransferWork [Thread-32,setFault:219] setting transient fault 2008-09-24T13:47:12.136-04:00 DEBUG service.TransferWork [Thread-32,processStates:246] [Request 62, Transfer 250] processing state for transfer of gsiftp:// lysine.umiacs.umd.edu:2811/fs/mikehomes/gt4admin/.globus/scratch/171727100.24523977945405806/ -> null I guess a transfer to 'null' in RFT really means delete the directory. However, it is consistently failing with this strange EOFException. To me the fact that it only occurs when submitting to Condor is really strange; I've already reinstalled the entire gt4-gram-condor unit but there was no change. I'll attach the container log with RFT/GRAM debug turned on. thanks, Adam
