[
https://issues.apache.org/jira/browse/FLINK-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155054#comment-15155054
]
ASF GitHub Bot commented on FLINK-3453:
---------------------------------------
GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/1678
[FLINK-3453] [runtime, runtime-web] Report partial stack trace sample for
cleared tasks
Ongoing stack trace samples were considered failed if the tasks were
cleared concurrently. Instead, they now report a partial sample success (i.e.
less samples taken than actually planned). This result will be ignored at the
job manager (as soon as the job state is terminal) or displayed as a partial
result for a brief period of time.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink 3453-tm_logs_bp_sampling
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1678.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1678
----
commit b26ff5b8f35c46d8567ed6c69e660c7f78f9ca4e
Author: Ufuk Celebi <[email protected]>
Date: 2016-02-19T22:43:09Z
[FLINK-3453] [runtime, runtime-web] Report partial stack trace sample for
cleared tasks
----
> Fix TaskManager logs exception when sampling backpressure while task completes
> ------------------------------------------------------------------------------
>
> Key: FLINK-3453
> URL: https://issues.apache.org/jira/browse/FLINK-3453
> Project: Flink
> Issue Type: Bug
> Components: Distributed Runtime
> Affects Versions: 1.0.0
> Reporter: Greg Hogan
> Assignee: Ufuk Celebi
> Priority: Minor
>
> Backpressure sampling is interrupted when a task completes. It may be best to
> create a new response class for this case.
> {noformat}
> java.lang.IllegalStateException: Cannot sample task
> 08f138723e8174e70f5e7ddc672f8954. Task was removed after 65 sample(s).
> at
> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleStackTraceSampleMessage(TaskManager.scala:743)
> at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:277)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
> at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2016-02-19 09:36:52,773 ERROR
> org.apache.flink.runtime.webmonitor.BackPressureStatsTracker - Failed to
> gather stack trace sample.
> java.lang.RuntimeException: Discarded
> at
> org.apache.flink.runtime.webmonitor.StackTraceSampleCoordinator$PendingStackTraceSample.discard(StackTraceSampleCoordinator.java:394)
> at
> org.apache.flink.runtime.webmonitor.StackTraceSampleCoordinator.cancelStackTraceSample(StackTraceSampleCoordinator.java:249)
> at
> org.apache.flink.runtime.webmonitor.StackTraceSampleCoordinator$StackTraceSampleCoordinatorActor.handleMessage(StackTraceSampleCoordinator.java:462)
> at
> org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:97)
> at
> org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68)
> at
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.IllegalStateException: Cannot sample task
> 08f138723e8174e70f5e7ddc672f8954. Task was removed after 65 sample(s).
> at
> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleStackTraceSampleMessage(TaskManager.scala:743)
> at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:277)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
> at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
> ... 9 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)