Hi Adam,
i can't say much about it right now, but at first glance it looks
to me that the application causes the problem. I have to add that
i don't know BOINC and probably i didn't understand all details.
Can you describe the role of hmmpfam a bit more:
* this is not the main executable, right?
* it is called by the main executable somehow under certain
conditions?
* if so: you said that hmmpfam actually should not be used at all.
in what situation could the executable call hmmpfam (i love
that word ... :-) ).
Martin
Hi,
We are experiencing a strange problem that is causing jobs to fail,
albeit
somewhat randomly and infrequently. At any given time, we may have
100
active GRAM jobs on a given resource. All of these jobs submit fine
and
there are usually no immediate failures. However, every so often, one
will
fail, and this can be days after it was submitted, with the following
error:
[EMAIL PROTECTED]:/export/grid_files/260600020.09477316738932795>
globusrun-ws -status -j jobEPR.txt
Current job state: Failed
globusrun-ws: Job failed: Invalid executable path
"/export/scratch/applications/a5671f0138bc65dc700001aa80a3f378/
hmmpfam".
ProcessDied
Now, the resource we are submitting to is unique in that we are not
actually
transferring in the hmmpfam executable; that is just a dummy path,
and
our
custom BOINC job manager does not attempt to make use of it, as BOINC
executables live elsewhere. So until recently, the executable
specified
on
that path never existed, and so the error *kinda* made sense; what
didn't
make sense is why it happened randomly. In an attempt to make this
problem
go away, I now have the BOINC job manager create a dummy executable
on that
path when the job is submitted, but it doesn't look like that has
helped
because the error is still popping up. *Now* the error message
certainly
doesn't make sense if taken at face value, because that has been a
"valid
path", technically speaking, for the lifetime of the job in
question -- yet
the job still failed. If it helps, I'll attach debug output below,
though
I
wasn't able to glean any additional information from it. Does anyone
have
a
guess as to why this would happen so randomly and infrequently, or
happen
in
the first place?
This one is costing us big time because when a job fails, Globus
deletes
all
the output collected thus far, and these are large batches of work.
Thanks!
Adam
[EMAIL PROTECTED]:/export/grid_files/260600020.09477316738932795>
globusrun-ws -debug -status -j jobEPR.txt
=== REQUEST MESSAGE (length 816) (time 1206627634.506184000) ===
<ns00:Envelope
xmlns:ns00="http://schemas.xmlsoap.org/soap/
envelope/"><ns00:Header></
ns00:Header><ns00:Body><ns01:GetMultipleResourceProperties
xmlns:ns01="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.xsd
"><ns01:ResourceProperty
xmlns:ns02="http://www.globus.org/namespaces/2004/10/gram/job/
types">ns02:state</ns01:ResourceProperty><ns01:ResourceProperty
xmlns:ns02="http://www.globus.org/namespaces/2004/10/gram/job/
types">ns02:holding</ns01:ResourceProperty><ns01:ResourceProperty
xmlns:ns03="http://www.globus.org/namespaces/2004/10/gram/job/
faults">ns03:fault</ns01:ResourceProperty><ns01:ResourceProperty
xmlns:ns02="http://www.globus.org/namespaces/2004/10/gram/job/types
">ns02:exitCode</ns01:ResourceProperty></
ns01:GetMultipleResourceProperties></ns00:Body></ns00:Envelope>
----------------------------------------------
=== RESPONSE MESSAGE (length 6399) (time 1206627634.546965000) ===
<soapenv:Envelope
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance" xmlns:wsa="
http://schemas.xmlsoap.org/ws/2004/03/
addressing"><soapenv:Header><wsa:MessageID
soapenv:mustUnderstand="0">uuid:f6d14500-fc08-11dc-
a90a-8a2384ad991f</wsa:MessageID><wsa:To
soapenv:mustUnderstand="0">
http://schemas.xmlsoap.org/ws/2004/03/addressing/role/anonymous</
wsa:To><wsa:Action
soapenv:mustUnderstand="0">
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties/GetMultipleResourcePropertiesResponse
</wsa:Action><wsa:From
soapenv:mustUnderstand="0" xmlns:ns4="
http://www.globus.org/namespaces/2004/10/gram/job"><wsa:Address>
https://128.8.120.35:8443/wsrf/services/ManagedExecutableJobService</
wsa:Address><wsa:ReferenceProperties><ns4:ResourceID
xmlns:ns4="http://www.globus.org/namespaces/2004/10/gram/
job">a1253c20-fa72-11dc-a908-8a2384ad991f</ns4:ResourceID></
wsa:ReferenceProperties></wsa:From><wsa:RelatesTo
RelationshipType="wsa:Reply"
soapenv:mustUnderstand="0">uuid:f6d04560-fc08-11dc-
b962-000f1f66888a</wsa:RelatesTo></
soapenv:Header><soapenv:Body><GetMultipleResourcePropertiesResponse
xmlns="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.xsd
"><ns1:state
xmlns:ns1="http://www.globus.org/namespaces/2004/10/gram/job/
types">Failed</ns1:state><ns2:holding
xmlns:ns2="http://www.globus.org/namespaces/2004/10/gram/job/
types">false</ns2:holding><ns3:fault
xmlns:ns3="http://www.globus.org/namespaces/2004/10/gram/job/
faults"><ns3:invalidPathFault><ns4:Timestamp
xmlns:ns4="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd
">2008-03-27T06:09:37.203Z</ns4:Timestamp><ns5:Originator xmlns:ns5="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd
"><wsa:Address>
https://128.8.120.35:8443/wsrf/services/ManagedJobFactoryService</
wsa:Address><wsa:ReferenceProperties><ns6:ResourceID
xmlns:ns6="http://www.globus.org/namespaces/2004/10/gram/
job">a1253c20-fa72-11dc-a908-8a2384ad991f</ns6:ResourceID></
wsa:ReferenceProperties><wsa:ReferenceParameters/></
ns5:Originator><ns7:Description
xmlns:ns7="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd
">Invalid
executable path
"/export/scratch/applications/a5671f0138bc65dc700001aa80a3f378/
hmmpfam".</ns7:Description><ns8:FaultCause
xmlns:ns8="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd
"><ns8:Timestamp>2008-03-27T06:09:37.203Z</
ns8:Timestamp><ns8:ErrorCode
dialect="http://www.globus.org/fault/stacktrace">
at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(
NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:27)
at
java.lang.reflect.Constructor.newInstance(Constructor.java:494)
at java.lang.Class.newInstance0(Class.java:350)
at java.lang.Class.newInstance(Class.java:303)
at
org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:485)
at org.globus.exec.utils.FaultUtils.createInvalidPathFault(
FaultUtils.java:129)
at
org.globus.exec.service.exec.StateMachine.createFaultFromErrorCode(
StateMachine.java:3184)
at
org
.globus
.exec.service.exec.StateMachine.processWaitingForStateChangesState
(StateMachine.java:1652)
at sun.reflect.GeneratedMethodAccessor6202.invoke(Unknown
Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.globus.exec.service.exec.StateMachine.processState(
StateMachine.java:328)
at org.globus.exec.service.exec.RunThread.run(RunThread.java:
85)
</ns8:ErrorCode><ns8:Description>
org.globus.exec.generated.InvalidPathFaultType</ns8:Description></
ns8:FaultCause><ns9:FaultCause
xmlns:ns9="
http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd
"><ns9:Timestamp>2008-03-27T06:09:37.203Z
</ns9:Timestamp><ns9:Description>ProcessDied</
ns9:Description><ns9:FaultCause><ns9:Timestamp>2008-03-27T06:09:
37.207Z</ns9:Timestamp><ns9:ErrorCode dialect="
http://www.globus.org/fault/stacktrace">java.lang.Exception:
ProcessDied
at
org.globus.exec.service.exec.StateMachine.createFaultFromErrorCode(
StateMachine.java:3127)
at
org
.globus
.exec.service.exec.StateMachine.processWaitingForStateChangesState
(StateMachine.java:1652)
at sun.reflect.GeneratedMethodAccessor6202.invoke(Unknown
Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.globus.exec.service.exec.StateMachine.processState(
StateMachine.java:328)
at org.globus.exec.service.exec.RunThread.run(RunThread.java:
85)
</ns9:ErrorCode><ns9:Description>java.lang.Exception
</ns9:Description></
ns9:FaultCause><ns9:FaultCause><ns9:Timestamp>2008-03-27T06:09:
37.207Z</ns9:Timestamp><ns9:ErrorCode dialect="
http://www.globus.org/fault/stacktrace">
at
org.globus.wsrf.utils.FaultHelper.toBaseFault(FaultHelper.java
:282)
at
org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:505)
at org.globus.exec.utils.FaultUtils.createInvalidPathFault(
FaultUtils.java:129)
at
org.globus.exec.service.exec.StateMachine.createFaultFromErrorCode(
StateMachine.java:3184)
at
org
.globus
.exec.service.exec.StateMachine.processWaitingForStateChangesState
(StateMachine.java:1652)
at sun.reflect.GeneratedMethodAccessor6202.invoke(Unknown
Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.globus.exec.service.exec.StateMachine.processState(
StateMachine.java:328)
at org.globus.exec.service.exec.RunThread.run(RunThread.java:
85)
</
ns9:ErrorCode><ns9:Description>org.oasis.wsrf.faults.BaseFaultType</
ns9:Description></ns9:FaultCause></
ns9:FaultCause><ns3:stateWhenFailureOccurred>Active</
ns3:stateWhenFailureOccurred><ns3:command>submit</
ns3:command><ns3:gt2ErrorCode>5</
ns3:gt2ErrorCode><ns3:attribute>executable</
ns3:attribute><ns3:path>/export/scratch/applications/
a5671f0138bc65dc700001aa80a3f378/hmmpfam</ns3:path></
ns3:invalidPathFault></ns3:fault><ns10:exitCode
xmlns:ns10="http://www.globus.org/namespaces/2004/10/gram/job/types
">5</ns10:exitCode></GetMultipleResourcePropertiesResponse></
soapenv:Body></soapenv:Envelope>
----------------------------------------------
Current job state: Failed
globusrun-ws: Job failed: Invalid executable path
"/export/scratch/applications/a5671f0138bc65dc700001aa80a3f378/
hmmpfam".
ProcessDied