[jira] [Commented] (BEAM-10927) Beam Flink Runner 1.10 checkpoint failure

Beam JIRA Bot (Jira) Sat, 15 May 2021 10:20:11 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345089#comment-17345089
 ]


Beam JIRA Bot commented on BEAM-10927:
--------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it 
has been labeled "stale-P2". If this issue is still affecting you, we care! 
Please comment and remove the label. Otherwise, in 14 days the issue will be 
moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed 
explanation of what these priorities mean.


> Beam Flink Runner 1.10 checkpoint failure
> -----------------------------------------
>
>                 Key: BEAM-10927
>                 URL: https://issues.apache.org/jira/browse/BEAM-10927
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>    Affects Versions: 2.23.0
>            Reporter: Omkar Deshpande
>            Priority: P2
>              Labels: stale-P2
>
> Recently upgraded to beam-runners-flink-1.10 v2.23.0 from 
> beam-runners-flink-1.9 v2.23.0. Also, upgraded the flink server to 1.10.2 
> from 1.9.3.
> The beam pipeline reads from kafkaio and writes to kafkaio and there is an 
> in-memory pardo between PBegin and PDone. The application is configured to 
> use s3 for checkpointing and the state backend is RocksDB.
> This beam pipeline was working as expected with beam-runners-flink-1.9 as 
> expected. But after upgrading to beam-runners-flink-1.10 the checkpoints keep 
> timing out. I have tried increasing time out to several hours. But 
> checkpoints keep timing out.
> There are no exceptions in the log. Based on the logs, both synchronous and 
> asynchronous phases of checkpointing are not happening. Usually "Trigger 
> checkpoint" log statement is followed by "Confirm checkpoint" when the 
> checkpoint succeeds. But with 1.10, I only see "Trigger checkpoint" and no 
> confirmation of completion or even indication of progress. There are enough 
> cpu and memory available and there is no deadlock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10927) Beam Flink Runner 1.10 checkpoint failure

Reply via email to