Sergey Nuyanzin created FLINK-25904:
---------------------------------------
Summary: NullArgumentException in case of increasing number of
nodes for the job
Key: FLINK-25904
URL: https://issues.apache.org/jira/browse/FLINK-25904
Project: Flink
Issue Type: Bug
Components: Runtime / Metrics
Affects Versions: 1.14.3
Reporter: Sergey Nuyanzin
We have a job running on one node
after increasing number of nodes to e.g. 3 on a new nodes job starts failing
with
{noformat}ERROR Unhandled exception.
(org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler:260)
org.apache.commons.math3.exception.NullArgumentException: input array
at
org.apache.commons.math3.util.MathArrays.verifyValues(MathArrays.java:1650)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.commons.math3.stat.descriptive.AbstractUnivariateStatistic.test(AbstractUnivariateStatistic.java:158)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:272)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:241)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics$CommonMetricsSnapshot.getPercentile(DescriptiveStatisticsHistogramStatistics.java:158)
>
at
org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics.getQuantile(DescriptiveStatisticsHistogramStatistics.java:52)
~[flink-dist_2.12-1.14.3.>
at
org.apache.flink.runtime.checkpoint.StatsSummarySnapshot.getQuantile(StatsSummarySnapshot.java:108)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.flink.runtime.rest.messages.checkpoints.StatsSummaryDto.valueOf(StatsSummaryDto.java:81)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.createCheckpointingStatistics(CheckpointingStatisticsHandler.java:129)
~[fli>
at
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:84)
~[flink-dist_2.12-1.14>
at
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:58)
~[flink-dist_2.12-1.14>
at
org.apache.flink.runtime.rest.handler.job.AbstractAccessExecutionGraphHandler.handleRequest(AbstractAccessExecutionGraphHandler.java:68)
~[flink-dist_2.12-1.14.3>
at
org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$0(AbstractExecutionGraphHandler.java:87)
~[flink-dist_2.12-1.14.3.ja>
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
[?:?]
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
[?:?]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)