[jira] [Commented] (FLINK-31963) java.lang.ArrayIndexOutOfBoundsException when scale down via autoscaler

Tan Kim (Jira) Fri, 28 Apr 2023 11:04:06 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-31963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717771#comment-17717771
 ]


Tan Kim commented on FLINK-31963:
---------------------------------

A question unrelated to this ticket, but if the subtasks that exist in the 
above jobgraph all appear to be one, why is that?
In order to do source scaling, the outputRecords value needs to be non-zero, 
but since the downstream after the kafka source stream is not separated on the 
jobgraph, the outputRecords is getting zero, so we explicitly added a keyBy 
operator to the kafka source stream so that we can intentionally separate them 
and then calculate the outputRecords value.
(I don't think this is very good for performance) Is there any other way to 
ensure that the streams are separated into two at the desired location in the 
jobgraph?

> java.lang.ArrayIndexOutOfBoundsException when scale down via autoscaler
> -----------------------------------------------------------------------
>
>                 Key: FLINK-31963
>                 URL: https://issues.apache.org/jira/browse/FLINK-31963
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.17.0
>         Environment: Flink: 1.17.0
> FKO: 1.4.0
> StateBackend: RocksDB(Genetic Incremental Checkpoint & Unaligned Checkpoint 
> enabled)
>            Reporter: Tan Kim
>            Priority: Critical
>              Labels: stability
>         Attachments: image-2023-04-29-02-48-46-279.png, 
> image-2023-04-29-02-49-05-607.png, jobmanager_error.txt, taskmanager_error.txt
>
>
> I'm testing Autoscaler through Kubernetes Operator and I'm facing the 
> following issue.
> As you know, when a job is scaled down through the autoscaler, the job 
> manager and task manager go down and then back up again.
> When this happens, an index out of bounds exception is thrown and the state 
> is not restored from a checkpoint.
> [~gyfora] told me via the Flink Slack troubleshooting channel that this is 
> likely an issue with Unaligned Checkpoint and not an issue with the 
> autoscaler, but I'm opening a ticket with Gyula for more clarification.
> Please see the attached JM and TM error logs.
> Thank you.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31963) java.lang.ArrayIndexOutOfBoundsException when scale down via autoscaler

Reply via email to