[jira] [Commented] (FLINK-31963) java.lang.ArrayIndexOutOfBoundsException when scaling down with unaligned checkpoints

Stefan Richter (Jira) Mon, 08 May 2023 09:53:11 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-31963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720596#comment-17720596
 ]


Stefan Richter commented on FLINK-31963:
----------------------------------------

Hi, just to clarify: when you say a checkpoint that fails once fails always - 
does this only apply for restore with rescaling or can you also not recover 
from the CP when the parallelism remains unchanged? If it only happens with 
rescaling, can you at least recover for some parallelism values or for no 
change at all?

> java.lang.ArrayIndexOutOfBoundsException when scaling down with unaligned 
> checkpoints
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-31963
>                 URL: https://issues.apache.org/jira/browse/FLINK-31963
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.17.0
>         Environment: Flink: 1.17.0
> FKO: 1.4.0
> StateBackend: RocksDB(Genetic Incremental Checkpoint & Unaligned Checkpoint 
> enabled)
>            Reporter: Tan Kim
>            Priority: Critical
>              Labels: stability
>         Attachments: image-2023-04-29-02-49-05-607.png, jobmanager_error.txt, 
> taskmanager_error.txt
>
>
> I'm testing Autoscaler through Kubernetes Operator and I'm facing the 
> following issue.
> As you know, when a job is scaled down through the autoscaler, the job 
> manager and task manager go down and then back up again.
> When this happens, an index out of bounds exception is thrown and the state 
> is not restored from a checkpoint.
> [~gyfora] told me via the Flink Slack troubleshooting channel that this is 
> likely an issue with Unaligned Checkpoint and not an issue with the 
> autoscaler, but I'm opening a ticket with Gyula for more clarification.
> Please see the attached JM and TM error logs.
> Thank you.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31963) java.lang.ArrayIndexOutOfBoundsException when scaling down with unaligned checkpoints

Reply via email to