[
https://issues.apache.org/jira/browse/FLINK-33588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tongtong Zhu updated FLINK-33588:
---------------------------------
Flags: Patch,Important
Language: java
Description:
When the Flink task is first started, the checkpoint data is null due to the
lack of data, and Percentile throws a null pointer exception when calculating
the percentage. After multiple tests, I found that it is necessary to set an
initial value for the statistical data value of the checkpoint when the
checkpoint data is null (i.e. at the beginning of the task) to solve this
problem.
The following is an abnormal description of the bug:
2023-09-13 15:02:54,608 ERROR
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler
[] - Unhandled exception.
org.apache.commons.math3.exception.NullArgumentException: input array
at
org.apache.commons.math3.util.MathArrays.verifyValues(MathArrays.java:1650)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.commons.math3.stat.descriptive.AbstractUnivariateStatistic.test(AbstractUnivariateStatistic.java:158)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:272)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:241)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics$CommonMetricsSnapshot.getPercentile(DescriptiveStatisticsHistogramStatistics.java:159)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics.getQuantile(DescriptiveStatisticsHistogramStatistics.java:53)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.flink.runtime.checkpoint.StatsSummarySnapshot.getQuantile(StatsSummarySnapshot.java:108)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.flink.runtime.rest.messages.checkpoints.StatsSummaryDto.valueOf(StatsSummaryDto.java:81)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.createCheckpointingStatistics(CheckpointingStatisticsHandler.java:129)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:84)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:58)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.flink.runtime.rest.handler.job.AbstractAccessExecutionGraphHandler.handleRequest(AbstractAccessExecutionGraphHandler.java:68)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$0(AbstractExecutionGraphHandler.java:87)
~[flink-dist_2.12-1.14.5.jar:1.14.5]
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
[?:1.8.0_151]
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
[?:1.8.0_151]
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
[?:1.8.0_151]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_151]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_151]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[?:1.8.0_151]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[?:1.8.0_151]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_151]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
was:When the Flink task is first started, the checkpoint data is null due to
the lack of data, and Percentile throws a null pointer exception when
calculating the percentage. After multiple tests, I found that it is necessary
to set an initial value for the statistical data value of the checkpoint when
the checkpoint data is null (i.e. at the beginning of the task) to solve this
problem.
> Fix Flink Checkpointing Statistics Bug
> --------------------------------------
>
> Key: FLINK-33588
> URL: https://issues.apache.org/jira/browse/FLINK-33588
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.14.5, 1.16.0, 1.17.0, 1.15.2, 1.14.6, 1.18.0, 1.17.1
> Reporter: Tongtong Zhu
> Priority: Major
> Fix For: 1.19.0, 1.18.1
>
>
> When the Flink task is first started, the checkpoint data is null due to the
> lack of data, and Percentile throws a null pointer exception when calculating
> the percentage. After multiple tests, I found that it is necessary to set an
> initial value for the statistical data value of the checkpoint when the
> checkpoint data is null (i.e. at the beginning of the task) to solve this
> problem.
> The following is an abnormal description of the bug:
> 2023-09-13 15:02:54,608 ERROR
> org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler
> [] - Unhandled exception.
> org.apache.commons.math3.exception.NullArgumentException: input array
> at
> org.apache.commons.math3.util.MathArrays.verifyValues(MathArrays.java:1650)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> org.apache.commons.math3.stat.descriptive.AbstractUnivariateStatistic.test(AbstractUnivariateStatistic.java:158)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:272)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:241)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics$CommonMetricsSnapshot.getPercentile(DescriptiveStatisticsHistogramStatistics.java:159)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics.getQuantile(DescriptiveStatisticsHistogramStatistics.java:53)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> org.apache.flink.runtime.checkpoint.StatsSummarySnapshot.getQuantile(StatsSummarySnapshot.java:108)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> org.apache.flink.runtime.rest.messages.checkpoints.StatsSummaryDto.valueOf(StatsSummaryDto.java:81)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.createCheckpointingStatistics(CheckpointingStatisticsHandler.java:129)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:84)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:58)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> org.apache.flink.runtime.rest.handler.job.AbstractAccessExecutionGraphHandler.handleRequest(AbstractAccessExecutionGraphHandler.java:68)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$0(AbstractExecutionGraphHandler.java:87)
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> [?:1.8.0_151]
> at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> [?:1.8.0_151]
> at
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
> [?:1.8.0_151]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [?:1.8.0_151]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_151]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [?:1.8.0_151]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [?:1.8.0_151]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_151]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_151]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)