[
https://issues.apache.org/jira/browse/TAVERNA-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696959#comment-14696959
]
Stian Soiland-Reyes commented on TAVERNA-871:
---------------------------------------------
When a job is waiting for a control link, it is queued.
https://github.com/apache/incubator-taverna-engine/blob/master/taverna-workflowmodel-impl/src/main/java/org/apache/taverna/workflowmodel/processor/dispatch/impl/DispatchStackImpl.java#L188
is where the error occurs. My first guess would be that:
queues.get(owningProcess) returns null
which is odd given that it just did:
synchronized (queues) {
// ...
queues.containsKey(owningProcess)
But down here where the queue is purged it is NOT synchronized:
https://github.com/apache/incubator-taverna-engine/blob/master/taverna-workflowmodel-impl/src/main/java/org/apache/taverna/workflowmodel/processor/dispatch/impl/DispatchStackImpl.java#L302
So perhaps changing this to:
synchronized(queues) {
queues.remove()
}
would do the trick.
To test this in Taverna 2.5 command line you can do the change from the
old/core-1.5 tag (note: Different folders and Java package name)
https://github.com/apache/incubator-taverna-engine/blob/old/core-1.5/workflowmodel-impl/src/main/java/net/sf/taverna/t2/workflowmodel/processor/dispatch/impl/DispatchStackImpl.java#L314:
and after mvn clean install of workflowmodel-impl.jar - copy in the replacement
workflowmodel-impl-1.5.jar (which then isn't anymore 1.5) into the
corresponding folder under repository/ folder of your Taverna installation and
rerun the workflow tests.
Is anyone able to have a go with this and see if this fixes the issue?
(Obviously you would first need to verify that loop-running
control-0003x0005.t2flow breaks it)
> Race condition: occassional NullPointerException in DispatchStackImpl
> ---------------------------------------------------------------------
>
> Key: TAVERNA-871
> URL: https://issues.apache.org/jira/browse/TAVERNA-871
> Project: Apache Taverna
> Issue Type: Bug
> Components: Taverna Engine
> Reporter: Stian Soiland-Reyes
> Fix For: engine 3.1.0
>
> Attachments: control-0003x0005.t2flow
>
>
> Raised by Javier Rojas Balderrama on users:
> {quote}
> I'm executing some workflows by command line and from time to time the
> execution freezes so I have to start over the execution. The log associated
> to this is bellow. Is this only a command line issue or a general behaviour?
> {quote}
> {code}
> INFO 2015-07-29 08:49:11,170
> (de.uni_luebeck.inb.knowarc.usecases.invocation.local.LocalUseCaseInvocation:116)
> - mainTempDirectory is /tmp
> INFO 2015-07-29 08:49:11,170
> (de.uni_luebeck.inb.knowarc.usecases.invocation.local.LocalUseCaseInvocation:117)
> - Using tempDir /tmp/usecase1293979972903664991dir
> INFO 2015-07-29 08:49:11,170
> (net.sf.taverna.t2.activities.externaltool.ExternalToolActivity:237) - Run id
> is cddff932-9804-4a55-bef7-4679eab931c5
> INFO 2015-07-29 08:49:11,170
> (de.uni_luebeck.inb.knowarc.usecases.invocation.local.LocalUseCaseInvocation:351)
> - cmds[0] = /bin/sh
> INFO 2015-07-29 08:49:11,170
> (de.uni_luebeck.inb.knowarc.usecases.invocation.local.LocalUseCaseInvocation:351)
> - cmds[1] = -c
> INFO 2015-07-29 08:49:11,170
> (de.uni_luebeck.inb.knowarc.usecases.invocation.local.LocalUseCaseInvocation:351)
> - cmds[2] = sleep 0
> INFO 2015-07-29 08:49:11,170
> (de.uni_luebeck.inb.knowarc.usecases.invocation.local.LocalUseCaseInvocation:353)
> - Command is sleep 0 in directory /tmp/usecase1293979972903664991dir
> WARN 2015-07-29 08:49:11,173
> (net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invoke:236) -
> Failed (INVOCATION) invoking
> net.sf.taverna.t2.activities.externaltool.ExternalToolActivity@415ce54f for
> job DispatchJobEvent facade0:control-0045x0035:S_1567[]: Uncaught exception
> while invoking
> net.sf.taverna.t2.activities.externaltool.ExternalToolActivity@415ce54f
> java.lang.NullPointerException
> at
> net.sf.taverna.t2.workflowmodel.processor.dispatch.impl.DispatchStackImpl.satisfyConditions(DispatchStackImpl.java:188)
> at
> net.sf.taverna.t2.workflowmodel.impl.ProcessorImpl$2.finishedWith(ProcessorImpl.java:176)
> at
> net.sf.taverna.t2.workflowmodel.processor.dispatch.impl.DispatchStackImpl$TopLayer.sendCachePurge(DispatchStackImpl.java:313)
> at
> net.sf.taverna.t2.workflowmodel.processor.dispatch.impl.DispatchStackImpl$TopLayer.receiveResult(DispatchStackImpl.java:281)
> at
> net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize.receiveResult(Parallelize.java:165)
> at
> net.sf.taverna.t2.workflowmodel.processor.dispatch.AbstractDispatchLayer.receiveResult(AbstractDispatchLayer.java:85)
> at
> net.sf.taverna.t2.workflowmodel.processor.dispatch.AbstractErrorHandlerLayer.receiveResult(AbstractErrorHandlerLayer.java:136)
> at
> net.sf.taverna.t2.workflowmodel.processor.dispatch.AbstractErrorHandlerLayer.receiveResult(AbstractErrorHandlerLayer.java:136)
> at
> net.sf.taverna.t2.workflowmodel.processor.dispatch.AbstractDispatchLayer.receiveResult(AbstractDispatchLayer.java:85)
> at
> net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invoke$InvokeCallBack.receiveResult(Invoke.java:352)
> at
> net.sf.taverna.t2.activities.externaltool.ExternalToolActivity$1.run(ExternalToolActivity.java:272)
> at java.lang.Thread.run(Thread.java:745)
> ERROR 2015-07-29 08:49:11,174
> (net.sf.taverna.t2.workflowmodel.processor.dispatch.AbstractErrorHandlerLayer:200)
> - Could not find any active jobs for facade0:control-0045x0035:S_1567
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)