ibzib commented on pull request #14719: URL: https://github.com/apache/beam/pull/14719#issuecomment-836955490
> Arrghh seems metrics are broken for the Portable runner. Can you maybe what is the deal @ibzib ? https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Phrase/131/ Looks like it's hitting an NPE while unregistering metrics because the [taskExecutorManager](https://github.com/apache/flink/blob/9fd6ecf18ddb744a971f575ab11eb19b44383899/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManager.java#L705) is null. This seems like a Flink bug to me -- if registers these metrics, they should remain accessible even after the slot manager has been shut down. Getting these metrics should probably return 0 rather than NPE. ``` May 08, 2021 4:57:46 AM org.apache.flink.runtime.metrics.MetricRegistryImpl unregister WARNING: Error while unregistering metric: taskSlotsAvailable. java.lang.NullPointerException at org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.getNumberFreeSlots(DeclarativeSlotManager.java:715) at org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.lambda$registerSlotManagerMetrics$2(DeclarativeSlotManager.java:200) at org.apache.beam.runners.flink.metrics.Metrics.toString(Metrics.java:33) at org.apache.beam.runners.flink.metrics.FileReporter.notifyOfRemovedMetric(FileReporter.java:72) at org.apache.flink.runtime.metrics.MetricRegistryImpl.unregister(MetricRegistryImpl.java:435) at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.close(AbstractMetricGroup.java:333) at org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.close(DeclarativeSlotManager.java:241) at org.apache.flink.runtime.resourcemanager.ResourceManager.stopResourceManagerServices(ResourceManager.java:298) at org.apache.flink.runtime.resourcemanager.ResourceManager.onStop(ResourceManager.java:275) at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:563) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:186) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) May 08, 2021 4:57:46 AM org.apache.flink.runtime.metrics.MetricRegistryImpl unregister WARNING: Error while unregistering metric: taskSlotsTotal. java.lang.NullPointerException at org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.getNumberRegisteredSlots(DeclarativeSlotManager.java:705) at org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.lambda$registerSlotManagerMetrics$3(DeclarativeSlotManager.java:202) at org.apache.beam.runners.flink.metrics.Metrics.toString(Metrics.java:33) at org.apache.beam.runners.flink.metrics.FileReporter.notifyOfRemovedMetric(FileReporter.java:72) at org.apache.flink.runtime.metrics.MetricRegistryImpl.unregister(MetricRegistryImpl.java:435) at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.close(AbstractMetricGroup.java:333) at org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.close(DeclarativeSlotManager.java:241) at org.apache.flink.runtime.resourcemanager.ResourceManager.stopResourceManagerServices(ResourceManager.java:298) at org.apache.flink.runtime.resourcemanager.ResourceManager.onStop(ResourceManager.java:275) at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:563) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:186) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
