[ https://issues.apache.org/jira/browse/FLINK-31963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721179#comment-17721179 ]
Hangxiang Yu commented on FLINK-31963: -------------------------------------- Hi, [~pnowojski]. I am a bit sure that it may not be related to unified file mergeing of unaligned checkpoints because I meet above exception in 1.15. My job is a bit complicated so I tried to simplify it to reproduce it. But I haven't currently. I will share more if I can reproduce it by a simple job or an ITCase. > java.lang.ArrayIndexOutOfBoundsException when scaling down with unaligned > checkpoints > ------------------------------------------------------------------------------------- > > Key: FLINK-31963 > URL: https://issues.apache.org/jira/browse/FLINK-31963 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.17.0 > Environment: Flink: 1.17.0 > FKO: 1.4.0 > StateBackend: RocksDB(Genetic Incremental Checkpoint & Unaligned Checkpoint > enabled) > Reporter: Tan Kim > Priority: Critical > Labels: stability > Attachments: image-2023-04-29-02-49-05-607.png, jobmanager_error.txt, > taskmanager_error.txt > > > I'm testing Autoscaler through Kubernetes Operator and I'm facing the > following issue. > As you know, when a job is scaled down through the autoscaler, the job > manager and task manager go down and then back up again. > When this happens, an index out of bounds exception is thrown and the state > is not restored from a checkpoint. > [~gyfora] told me via the Flink Slack troubleshooting channel that this is > likely an issue with Unaligned Checkpoint and not an issue with the > autoscaler, but I'm opening a ticket with Gyula for more clarification. > Please see the attached JM and TM error logs. > Thank you. -- This message was sent by Atlassian Jira (v8.20.10#820010)