Hi, We are experiencing a strange problem that is causing jobs to fail, albeit somewhat randomly and infrequently. At any given time, we may have 100 active GRAM jobs on a given resource. All of these jobs submit fine and there are usually no immediate failures. However, every so often, one will fail, and this can be days after it was submitted, with the following error:
[EMAIL PROTECTED]:/export/grid_files/260600020.09477316738932795> globusrun-ws -status -j jobEPR.txt Current job state: Failed globusrun-ws: Job failed: Invalid executable path "/export/scratch/applications/a5671f0138bc65dc700001aa80a3f378/hmmpfam". ProcessDied Now, the resource we are submitting to is unique in that we are not actually transferring in the hmmpfam executable; that is just a dummy path, and our custom BOINC job manager does not attempt to make use of it, as BOINC executables live elsewhere. So until recently, the executable specified on that path never existed, and so the error *kinda* made sense; what didn't make sense is why it happened randomly. In an attempt to make this problem go away, I now have the BOINC job manager create a dummy executable on that path when the job is submitted, but it doesn't look like that has helped because the error is still popping up. *Now* the error message certainly doesn't make sense if taken at face value, because that has been a "valid path", technically speaking, for the lifetime of the job in question -- yet the job still failed. If it helps, I'll attach debug output below, though I wasn't able to glean any additional information from it. Does anyone have a guess as to why this would happen so randomly and infrequently, or happen in the first place? This one is costing us big time because when a job fails, Globus deletes all the output collected thus far, and these are large batches of work. Thanks! Adam [EMAIL PROTECTED]:/export/grid_files/260600020.09477316738932795> globusrun-ws -debug -status -j jobEPR.txt === REQUEST MESSAGE (length 816) (time 1206627634.506184000) === <ns00:Envelope xmlns:ns00="http://schemas.xmlsoap.org/soap/envelope/"><ns00:Header></ns00:Header><ns00:Body><ns01:GetMultipleResourceProperties xmlns:ns01=" http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.xsd"><ns01:ResourceProperty xmlns:ns02="http://www.globus.org/namespaces/2004/10/gram/job/types">ns02:state</ns01:ResourceProperty><ns01:ResourceProperty xmlns:ns02="http://www.globus.org/namespaces/2004/10/gram/job/types">ns02:holding</ns01:ResourceProperty><ns01:ResourceProperty xmlns:ns03="http://www.globus.org/namespaces/2004/10/gram/job/faults">ns03:fault</ns01:ResourceProperty><ns01:ResourceProperty xmlns:ns02="http://www.globus.org/namespaces/2004/10/gram/job/types ">ns02:exitCode</ns01:ResourceProperty></ns01:GetMultipleResourceProperties></ns00:Body></ns00:Envelope> ---------------------------------------------- === RESPONSE MESSAGE (length 6399) (time 1206627634.546965000) === <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance" xmlns:wsa=" http://schemas.xmlsoap.org/ws/2004/03/addressing"><soapenv:Header><wsa:MessageID soapenv:mustUnderstand="0">uuid:f6d14500-fc08-11dc-a90a-8a2384ad991f</wsa:MessageID><wsa:To soapenv:mustUnderstand="0"> http://schemas.xmlsoap.org/ws/2004/03/addressing/role/anonymous</wsa:To><wsa:Action soapenv:mustUnderstand="0"> http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties/GetMultipleResourcePropertiesResponse</wsa:Action><wsa:From soapenv:mustUnderstand="0" xmlns:ns4=" http://www.globus.org/namespaces/2004/10/gram/job"><wsa:Address> https://128.8.120.35:8443/wsrf/services/ManagedExecutableJobService</wsa:Address><wsa:ReferenceProperties><ns4:ResourceID xmlns:ns4="http://www.globus.org/namespaces/2004/10/gram/job">a1253c20-fa72-11dc-a908-8a2384ad991f</ns4:ResourceID></wsa:ReferenceProperties></wsa:From><wsa:RelatesTo RelationshipType="wsa:Reply" soapenv:mustUnderstand="0">uuid:f6d04560-fc08-11dc-b962-000f1f66888a</wsa:RelatesTo></soapenv:Header><soapenv:Body><GetMultipleResourcePropertiesResponse xmlns=" http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.xsd"><ns1:state xmlns:ns1="http://www.globus.org/namespaces/2004/10/gram/job/types">Failed</ns1:state><ns2:holding xmlns:ns2="http://www.globus.org/namespaces/2004/10/gram/job/types">false</ns2:holding><ns3:fault xmlns:ns3="http://www.globus.org/namespaces/2004/10/gram/job/faults"><ns3:invalidPathFault><ns4:Timestamp xmlns:ns4=" http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd ">2008-03-27T06:09:37.203Z</ns4:Timestamp><ns5:Originator xmlns:ns5=" http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd "><wsa:Address> https://128.8.120.35:8443/wsrf/services/ManagedJobFactoryService</wsa:Address><wsa:ReferenceProperties><ns6:ResourceID xmlns:ns6="http://www.globus.org/namespaces/2004/10/gram/job">a1253c20-fa72-11dc-a908-8a2384ad991f</ns6:ResourceID></wsa:ReferenceProperties><wsa:ReferenceParameters/></ns5:Originator><ns7:Description xmlns:ns7=" http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd">Invalid executable path "/export/scratch/applications/a5671f0138bc65dc700001aa80a3f378/hmmpfam".</ns7:Description><ns8:FaultCause xmlns:ns8=" http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd "><ns8:Timestamp>2008-03-27T06:09:37.203Z</ns8:Timestamp><ns8:ErrorCode dialect="http://www.globus.org/fault/stacktrace"> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance( NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance( DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:494) at java.lang.Class.newInstance0(Class.java:350) at java.lang.Class.newInstance(Class.java:303) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:485) at org.globus.exec.utils.FaultUtils.createInvalidPathFault( FaultUtils.java:129) at org.globus.exec.service.exec.StateMachine.createFaultFromErrorCode( StateMachine.java:3184) at org.globus.exec.service.exec.StateMachine.processWaitingForStateChangesState (StateMachine.java:1652) at sun.reflect.GeneratedMethodAccessor6202.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.globus.exec.service.exec.StateMachine.processState( StateMachine.java:328) at org.globus.exec.service.exec.RunThread.run(RunThread.java:85) </ns8:ErrorCode><ns8:Description> org.globus.exec.generated.InvalidPathFaultType</ns8:Description></ns8:FaultCause><ns9:FaultCause xmlns:ns9=" http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd "><ns9:Timestamp>2008-03-27T06:09:37.203Z </ns9:Timestamp><ns9:Description>ProcessDied</ns9:Description><ns9:FaultCause><ns9:Timestamp>2008-03-27T06:09: 37.207Z</ns9:Timestamp><ns9:ErrorCode dialect=" http://www.globus.org/fault/stacktrace">java.lang.Exception: ProcessDied at org.globus.exec.service.exec.StateMachine.createFaultFromErrorCode( StateMachine.java:3127) at org.globus.exec.service.exec.StateMachine.processWaitingForStateChangesState (StateMachine.java:1652) at sun.reflect.GeneratedMethodAccessor6202.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.globus.exec.service.exec.StateMachine.processState( StateMachine.java:328) at org.globus.exec.service.exec.RunThread.run(RunThread.java:85) </ns9:ErrorCode><ns9:Description>java.lang.Exception </ns9:Description></ns9:FaultCause><ns9:FaultCause><ns9:Timestamp>2008-03-27T06:09: 37.207Z</ns9:Timestamp><ns9:ErrorCode dialect=" http://www.globus.org/fault/stacktrace"> at org.globus.wsrf.utils.FaultHelper.toBaseFault(FaultHelper.java :282) at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:505) at org.globus.exec.utils.FaultUtils.createInvalidPathFault( FaultUtils.java:129) at org.globus.exec.service.exec.StateMachine.createFaultFromErrorCode( StateMachine.java:3184) at org.globus.exec.service.exec.StateMachine.processWaitingForStateChangesState (StateMachine.java:1652) at sun.reflect.GeneratedMethodAccessor6202.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.globus.exec.service.exec.StateMachine.processState( StateMachine.java:328) at org.globus.exec.service.exec.RunThread.run(RunThread.java:85) </ns9:ErrorCode><ns9:Description>org.oasis.wsrf.faults.BaseFaultType</ns9:Description></ns9:FaultCause></ns9:FaultCause><ns3:stateWhenFailureOccurred>Active</ns3:stateWhenFailureOccurred><ns3:command>submit</ns3:command><ns3:gt2ErrorCode>5</ns3:gt2ErrorCode><ns3:attribute>executable</ns3:attribute><ns3:path>/export/scratch/applications/a5671f0138bc65dc700001aa80a3f378/hmmpfam</ns3:path></ns3:invalidPathFault></ns3:fault><ns10:exitCode xmlns:ns10="http://www.globus.org/namespaces/2004/10/gram/job/types ">5</ns10:exitCode></GetMultipleResourcePropertiesResponse></soapenv:Body></soapenv:Envelope> ---------------------------------------------- Current job state: Failed globusrun-ws: Job failed: Invalid executable path "/export/scratch/applications/a5671f0138bc65dc700001aa80a3f378/hmmpfam". ProcessDied
