[
https://issues.apache.org/jira/browse/BEAM-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345089#comment-17345089
]
Beam JIRA Bot commented on BEAM-10927:
--------------------------------------
This issue is P2 but has been unassigned without any comment for 60 days so it
has been labeled "stale-P2". If this issue is still affecting you, we care!
Please comment and remove the label. Otherwise, in 14 days the issue will be
moved to P3.
Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed
explanation of what these priorities mean.
> Beam Flink Runner 1.10 checkpoint failure
> -----------------------------------------
>
> Key: BEAM-10927
> URL: https://issues.apache.org/jira/browse/BEAM-10927
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Affects Versions: 2.23.0
> Reporter: Omkar Deshpande
> Priority: P2
> Labels: stale-P2
>
> Recently upgraded to beam-runners-flink-1.10 v2.23.0 from
> beam-runners-flink-1.9 v2.23.0. Also, upgraded the flink server to 1.10.2
> from 1.9.3.
> The beam pipeline reads from kafkaio and writes to kafkaio and there is an
> in-memory pardo between PBegin and PDone. The application is configured to
> use s3 for checkpointing and the state backend is RocksDB.
> This beam pipeline was working as expected with beam-runners-flink-1.9 as
> expected. But after upgrading to beam-runners-flink-1.10 the checkpoints keep
> timing out. I have tried increasing time out to several hours. But
> checkpoints keep timing out.
> There are no exceptions in the log. Based on the logs, both synchronous and
> asynchronous phases of checkpointing are not happening. Usually "Trigger
> checkpoint" log statement is followed by "Confirm checkpoint" when the
> checkpoint succeeds. But with 1.10, I only see "Trigger checkpoint" and no
> confirmation of completion or even indication of progress. There are enough
> cpu and memory available and there is no deadlock.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)