[ 
https://issues.apache.org/jira/browse/FLINK-33588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17797041#comment-17797041
 ] 

Tongtong Zhu commented on FLINK-33588:
--------------------------------------

[~jingge] CI report is no problem is Success

!image-2023-12-15-13-59-28-201.png!

> Fix Flink Checkpointing Statistics Bug
> --------------------------------------
>
>                 Key: FLINK-33588
>                 URL: https://issues.apache.org/jira/browse/FLINK-33588
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.14.5, 1.16.0, 1.17.0, 1.15.2, 1.14.6, 1.18.0, 1.17.1
>            Reporter: Tongtong Zhu
>            Assignee: Tongtong Zhu
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.19.0, 1.18.1
>
>         Attachments: FLINK-33588.patch, image-2023-12-11-17-35-23-391.png, 
> image-2023-12-13-11-35-43-780.png, image-2023-12-15-13-59-28-201.png, 
> newCommit-FLINK-33688.patch
>
>
> When the Flink task is first started, the checkpoint data is null due to the 
> lack of data, and Percentile throws a null pointer exception when calculating 
> the percentage. After multiple tests, I found that it is necessary to set an 
> initial value for the statistical data value of the checkpoint when the 
> checkpoint data is null (i.e. at the beginning of the task) to solve this 
> problem.
> The following is an abnormal description of the bug:
> 2023-09-13 15:02:54,608 ERROR 
> org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler
>  [] - Unhandled exception.
> org.apache.commons.math3.exception.NullArgumentException: input array
>     at 
> org.apache.commons.math3.util.MathArrays.verifyValues(MathArrays.java:1650) 
> ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> org.apache.commons.math3.stat.descriptive.AbstractUnivariateStatistic.test(AbstractUnivariateStatistic.java:158)
>  ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:272)
>  ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:241)
>  ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics$CommonMetricsSnapshot.getPercentile(DescriptiveStatisticsHistogramStatistics.java:159)
>  ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics.getQuantile(DescriptiveStatisticsHistogramStatistics.java:53)
>  ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> org.apache.flink.runtime.checkpoint.StatsSummarySnapshot.getQuantile(StatsSummarySnapshot.java:108)
>  ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> org.apache.flink.runtime.rest.messages.checkpoints.StatsSummaryDto.valueOf(StatsSummaryDto.java:81)
>  ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.createCheckpointingStatistics(CheckpointingStatisticsHandler.java:129)
>  ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:84)
>  ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:58)
>  ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> org.apache.flink.runtime.rest.handler.job.AbstractAccessExecutionGraphHandler.handleRequest(AbstractAccessExecutionGraphHandler.java:68)
>  ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$0(AbstractExecutionGraphHandler.java:87)
>  ~[flink-dist_2.12-1.14.5.jar:1.14.5]
>     at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) 
> [?:1.8.0_151]
>     at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  [?:1.8.0_151]
>     at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  [?:1.8.0_151]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_151]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_151]
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  [?:1.8.0_151]
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  [?:1.8.0_151]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_151]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_151]
>     at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to