Hi,

We are experiencing a strange problem that is causing jobs to fail, albeit
somewhat randomly and infrequently.  At any given time, we may have 100
active GRAM jobs on a given resource.  All of these jobs submit fine and
there are usually no immediate failures.  However, every so often, one will
fail, and this can be days after it was submitted, with the following error:

[EMAIL PROTECTED]:/export/grid_files/260600020.09477316738932795>
globusrun-ws -status -j jobEPR.txt
Current job state: Failed
globusrun-ws: Job failed: Invalid executable path
"/export/scratch/applications/a5671f0138bc65dc700001aa80a3f378/hmmpfam".
ProcessDied

Now, the resource we are submitting to is unique in that we are not actually
transferring in the hmmpfam executable; that is just a dummy path, and our
custom BOINC job manager does not attempt to make use of it, as BOINC
executables live elsewhere.  So until recently, the executable specified on
that path never existed, and so the error *kinda* made sense; what didn't
make sense is why it happened randomly.  In an attempt to make this problem
go away, I now have the BOINC job manager create a dummy executable on that
path when the job is submitted, but it doesn't look like that has helped
because the error is still popping up.  *Now* the error message certainly
doesn't make sense if taken at face value, because that has been a "valid
path", technically speaking, for the lifetime of the job in question -- yet
the job still failed.  If it helps, I'll attach debug output below, though I
wasn't able to glean any additional information from it.  Does anyone have a
guess as to why this would happen so randomly and infrequently, or happen in
the first place?

This one is costing us big time because when a job fails, Globus deletes all
the output collected thus far, and these are large batches of work.

Thanks!
Adam

[EMAIL PROTECTED]:/export/grid_files/260600020.09477316738932795>
globusrun-ws -debug -status -j jobEPR.txt

=== REQUEST MESSAGE (length 816) (time 1206627634.506184000) ===
<ns00:Envelope 
xmlns:ns00="http://schemas.xmlsoap.org/soap/envelope/";><ns00:Header></ns00:Header><ns00:Body><ns01:GetMultipleResourceProperties
xmlns:ns01="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.xsd";><ns01:ResourceProperty
xmlns:ns02="http://www.globus.org/namespaces/2004/10/gram/job/types";>ns02:state</ns01:ResourceProperty><ns01:ResourceProperty
xmlns:ns02="http://www.globus.org/namespaces/2004/10/gram/job/types";>ns02:holding</ns01:ResourceProperty><ns01:ResourceProperty
xmlns:ns03="http://www.globus.org/namespaces/2004/10/gram/job/faults";>ns03:fault</ns01:ResourceProperty><ns01:ResourceProperty
xmlns:ns02="http://www.globus.org/namespaces/2004/10/gram/job/types
">ns02:exitCode</ns01:ResourceProperty></ns01:GetMultipleResourceProperties></ns00:Body></ns00:Envelope>
----------------------------------------------

=== RESPONSE MESSAGE (length 6399) (time 1206627634.546965000) ===
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/";
xmlns:xsd="http://www.w3.org/2001/XMLSchema"; xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance"; xmlns:wsa="
http://schemas.xmlsoap.org/ws/2004/03/addressing";><soapenv:Header><wsa:MessageID
soapenv:mustUnderstand="0">uuid:f6d14500-fc08-11dc-a90a-8a2384ad991f</wsa:MessageID><wsa:To
soapenv:mustUnderstand="0">
http://schemas.xmlsoap.org/ws/2004/03/addressing/role/anonymous</wsa:To><wsa:Action
soapenv:mustUnderstand="0">
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties/GetMultipleResourcePropertiesResponse</wsa:Action><wsa:From
soapenv:mustUnderstand="0" xmlns:ns4="
http://www.globus.org/namespaces/2004/10/gram/job";><wsa:Address>
https://128.8.120.35:8443/wsrf/services/ManagedExecutableJobService</wsa:Address><wsa:ReferenceProperties><ns4:ResourceID
xmlns:ns4="http://www.globus.org/namespaces/2004/10/gram/job";>a1253c20-fa72-11dc-a908-8a2384ad991f</ns4:ResourceID></wsa:ReferenceProperties></wsa:From><wsa:RelatesTo
RelationshipType="wsa:Reply"
soapenv:mustUnderstand="0">uuid:f6d04560-fc08-11dc-b962-000f1f66888a</wsa:RelatesTo></soapenv:Header><soapenv:Body><GetMultipleResourcePropertiesResponse
xmlns="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.xsd";><ns1:state
xmlns:ns1="http://www.globus.org/namespaces/2004/10/gram/job/types";>Failed</ns1:state><ns2:holding
xmlns:ns2="http://www.globus.org/namespaces/2004/10/gram/job/types";>false</ns2:holding><ns3:fault
xmlns:ns3="http://www.globus.org/namespaces/2004/10/gram/job/faults";><ns3:invalidPathFault><ns4:Timestamp
xmlns:ns4="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd
">2008-03-27T06:09:37.203Z</ns4:Timestamp><ns5:Originator xmlns:ns5="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd
"><wsa:Address>
https://128.8.120.35:8443/wsrf/services/ManagedJobFactoryService</wsa:Address><wsa:ReferenceProperties><ns6:ResourceID
xmlns:ns6="http://www.globus.org/namespaces/2004/10/gram/job";>a1253c20-fa72-11dc-a908-8a2384ad991f</ns6:ResourceID></wsa:ReferenceProperties><wsa:ReferenceParameters/></ns5:Originator><ns7:Description
xmlns:ns7="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd";>Invalid
executable path
&quot;/export/scratch/applications/a5671f0138bc65dc700001aa80a3f378/hmmpfam&quot;.</ns7:Description><ns8:FaultCause
xmlns:ns8="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd
"><ns8:Timestamp>2008-03-27T06:09:37.203Z</ns8:Timestamp><ns8:ErrorCode
dialect="http://www.globus.org/fault/stacktrace";>
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(
NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
        at java.lang.Class.newInstance0(Class.java:350)
        at java.lang.Class.newInstance(Class.java:303)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:485)
        at org.globus.exec.utils.FaultUtils.createInvalidPathFault(
FaultUtils.java:129)
        at
org.globus.exec.service.exec.StateMachine.createFaultFromErrorCode(
StateMachine.java:3184)
        at
org.globus.exec.service.exec.StateMachine.processWaitingForStateChangesState
(StateMachine.java:1652)
        at sun.reflect.GeneratedMethodAccessor6202.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.globus.exec.service.exec.StateMachine.processState(
StateMachine.java:328)
        at org.globus.exec.service.exec.RunThread.run(RunThread.java:85)
</ns8:ErrorCode><ns8:Description>
org.globus.exec.generated.InvalidPathFaultType</ns8:Description></ns8:FaultCause><ns9:FaultCause
xmlns:ns9="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd
"><ns9:Timestamp>2008-03-27T06:09:37.203Z
</ns9:Timestamp><ns9:Description>ProcessDied</ns9:Description><ns9:FaultCause><ns9:Timestamp>2008-03-27T06:09:
37.207Z</ns9:Timestamp><ns9:ErrorCode dialect="
http://www.globus.org/fault/stacktrace";>java.lang.Exception: ProcessDied
        at
org.globus.exec.service.exec.StateMachine.createFaultFromErrorCode(
StateMachine.java:3127)
        at
org.globus.exec.service.exec.StateMachine.processWaitingForStateChangesState
(StateMachine.java:1652)
        at sun.reflect.GeneratedMethodAccessor6202.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.globus.exec.service.exec.StateMachine.processState(
StateMachine.java:328)
        at org.globus.exec.service.exec.RunThread.run(RunThread.java:85)
</ns9:ErrorCode><ns9:Description>java.lang.Exception
</ns9:Description></ns9:FaultCause><ns9:FaultCause><ns9:Timestamp>2008-03-27T06:09:
37.207Z</ns9:Timestamp><ns9:ErrorCode dialect="
http://www.globus.org/fault/stacktrace";>
        at org.globus.wsrf.utils.FaultHelper.toBaseFault(FaultHelper.java
:282)
        at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:505)
        at org.globus.exec.utils.FaultUtils.createInvalidPathFault(
FaultUtils.java:129)
        at
org.globus.exec.service.exec.StateMachine.createFaultFromErrorCode(
StateMachine.java:3184)
        at
org.globus.exec.service.exec.StateMachine.processWaitingForStateChangesState
(StateMachine.java:1652)
        at sun.reflect.GeneratedMethodAccessor6202.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.globus.exec.service.exec.StateMachine.processState(
StateMachine.java:328)
        at org.globus.exec.service.exec.RunThread.run(RunThread.java:85)
</ns9:ErrorCode><ns9:Description>org.oasis.wsrf.faults.BaseFaultType</ns9:Description></ns9:FaultCause></ns9:FaultCause><ns3:stateWhenFailureOccurred>Active</ns3:stateWhenFailureOccurred><ns3:command>submit</ns3:command><ns3:gt2ErrorCode>5</ns3:gt2ErrorCode><ns3:attribute>executable</ns3:attribute><ns3:path>/export/scratch/applications/a5671f0138bc65dc700001aa80a3f378/hmmpfam</ns3:path></ns3:invalidPathFault></ns3:fault><ns10:exitCode
xmlns:ns10="http://www.globus.org/namespaces/2004/10/gram/job/types
">5</ns10:exitCode></GetMultipleResourcePropertiesResponse></soapenv:Body></soapenv:Envelope>
----------------------------------------------
Current job state: Failed
globusrun-ws: Job failed: Invalid executable path
"/export/scratch/applications/a5671f0138bc65dc700001aa80a3f378/hmmpfam".
ProcessDied

Reply via email to