[
https://issues.apache.org/jira/browse/FLINK-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735176#comment-15735176
]
ASF GitHub Bot commented on FLINK-5114:
---------------------------------------
Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/2975#discussion_r91706223
--- Diff:
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
---
@@ -503,15 +504,37 @@ class TaskManager(
)
}
- case PartitionState(taskExecutionId, taskResultId, partitionId,
state) =>
- Option(runningTasks.get(taskExecutionId)) match {
+ // Updates the partition producer state
+ case PartitionProducerState(receiverExecutionId, result) =>
+ Option(runningTasks.get(receiverExecutionId)) match {
case Some(task) =>
- task.onPartitionStateUpdate(taskResultId, partitionId, state)
+ try {
+ result match {
+ case Left((intermediateDataSetId, resultPartitionId,
producerState)) =>
+ // Forward the state update to the task
+ task.onPartitionStateUpdate(
+ intermediateDataSetId,
+ resultPartitionId.getPartitionId,
+ producerState)
+
+ case Right(failure) =>
+ // Cancel or fail the execution
+ if
(failure.isInstanceOf[PartitionProducerDisposedException]) {
+ log.debug("Partition producer disposed. Cancelling
execution.", failure)
--- End diff --
I think this log statement should be on `info` level. Otherwise, going
through the logs in standard settings (only info level logging) leaves you
wondering why the task is cancelled all of a sudden.
> PartitionState update with finished execution fails
> ---------------------------------------------------
>
> Key: FLINK-5114
> URL: https://issues.apache.org/jira/browse/FLINK-5114
> Project: Flink
> Issue Type: Bug
> Components: Network
> Reporter: Ufuk Celebi
> Assignee: Ufuk Celebi
>
> If a partition state request is triggered for a producer that finishes before
> the request arrives, the execution is unregistered and the producer cannot be
> found. In this case the PartitionState returns null and the job fails.
> We need to check the producer location via the intermediate result partition
> in this case.
> See here: https://api.travis-ci.org/jobs/177668505/log.txt?deansi=true
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)