[
https://issues.apache.org/jira/browse/FLINK-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen resolved FLINK-230.
--------------------------------
Resolution: Fixed
Fix Version/s: (was: pre-apache)
0.7-incubating
Assignee: Stephan Ewen
Fixed in ae139f5ae2199a52e8d7f561f94db51631107d00
> Job Cancellation does not work properly: "Cannot find execution graph to job
> ID"
> --------------------------------------------------------------------------------
>
> Key: FLINK-230
> URL: https://issues.apache.org/jira/browse/FLINK-230
> Project: Flink
> Issue Type: Bug
> Reporter: GitHub Import
> Assignee: Stephan Ewen
> Labels: github-import
> Fix For: 0.7-incubating
>
>
> Hi,
> I noticed this error message on a failing Job.
> ```
> 12:37:10,697 INFO eu.stratosphere.nephele.execution.ExecutionStateTransition
> - JM: ExecutionState set from CANCELING to CANCELED for task Invoices file
> (7/8)
> 12:37:10,697 INFO eu.stratosphere.nephele.execution.ExecutionStateTransition
> - JM: ExecutionState set from CANCELING to CANCELED for task
> ([#2|https://github.com/stratosphere/stratosphere/issues/2] |
> [FLINK-2|https://issues.apache.org/jira/browse/FLINK-2]) filter invoices:
> month <= 12 (8/8)
> 12:37:10,697 INFO
> eu.stratosphere.nephele.jobmanager.scheduler.AbstractScheduler - Releasing
> instance hadoop02
> 12:37:10,699 INFO eu.stratosphere.nephele.jobmanager.JobManager
> - Status of job XX 0b7407b5ad73a40043c36c16baacf400) changed to FAILED
> 12:37:10,706 ERROR eu.stratosphere.nephele.jobmanager.JobManager
> - Cannot find execution graph to job ID 0b7407b5ad73a40043c36c16baacf400
> 12:37:10,706 ERROR eu.stratosphere.nephele.jobmanager.JobManager
> - Cannot find execution graph to job ID 0b7407b5ad73a40043c36c16baacf400
> 12:37:10,709 ERROR eu.stratosphere.nephele.jobmanager.JobManager
> - Cannot find execution graph to job ID 0b7407b5ad73a40043c36c16baacf400
> ```
> The errors occurs quite often:
> ```
> rmetzger@hadoop01:~/log$ cat nephele-rmetzger-jobmanager-hadoop01.log | grep
> "Cannot find" | wc -l
> 21262
> ```
> The TaskManager also reports errors:
> ```
> 12:37:14,951 ERROR
> eu.stratosphere.nephele.taskmanager.bytebuffered.ByteBufferedChannelManager
> - Cannot find task(s) waiting for data from source channel with ID
> 43930c029c759c003792e4dfd4411800
> 12:37:14,952 ERROR
> eu.stratosphere.nephele.taskmanager.bytebuffered.ByteBufferedChannelManager
> - Cannot find task(s) waiting for data from source channel with ID
> 0937e32b635954000efb7f68c0c80c00
> 12:37:14,953 ERROR
> eu.stratosphere.nephele.taskmanager.bytebuffered.ByteBufferedChannelManager
> - Cannot find task(s) waiting for data from source channel with ID
> 6922a9feb031540009dbe583b3fbe800
> ```
> ```
> rmetzger@hadoop01:~/log$ cat nephele-rmetzger-taskmanager-hadoop01.log | grep
> "for data from source channel" | wc -l 6612
> ```
> I also saw this
> ```
> 12:00:00,221 ERROR eu.stratosphere.nephele.execution.ExecutionStateTransition
> - java.lang.IllegalStateException: Unexpected state change: CANCELING ->
> FAILED
> at
> eu.stratosphere.nephele.execution.ExecutionStateTransition.checkTransition(ExecutionStateTransition.java:167)
> at
> eu.stratosphere.nephele.executiongraph.ExecutionVertex.updateExecutionState(ExecutionVertex.java:384)
> at
> eu.stratosphere.nephele.executiongraph.ExecutionVertex$1.run(ExecutionVertex.java:319)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> ```
> I did not see this behavior before, so it could be new (I did not do any
> major changes on the job)
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/230
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: bug, runtime,
> Created at: Fri Nov 01 15:02:46 CET 2013
> State: open
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)