[
https://issues.apache.org/jira/browse/BEAM-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17353805#comment-17353805
]
Beam JIRA Bot commented on BEAM-10927:
--------------------------------------
This issue was marked "stale-P2" and has not received a public comment in 14
days. It is now automatically moved to P3. If you are still affected by it, you
can comment and move it back to P2.
> Beam Flink Runner 1.10 checkpoint failure
> -----------------------------------------
>
> Key: BEAM-10927
> URL: https://issues.apache.org/jira/browse/BEAM-10927
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Affects Versions: 2.23.0
> Reporter: Omkar Deshpande
> Priority: P3
>
> Recently upgraded to beam-runners-flink-1.10 v2.23.0 from
> beam-runners-flink-1.9 v2.23.0. Also, upgraded the flink server to 1.10.2
> from 1.9.3.
> The beam pipeline reads from kafkaio and writes to kafkaio and there is an
> in-memory pardo between PBegin and PDone. The application is configured to
> use s3 for checkpointing and the state backend is RocksDB.
> This beam pipeline was working as expected with beam-runners-flink-1.9 as
> expected. But after upgrading to beam-runners-flink-1.10 the checkpoints keep
> timing out. I have tried increasing time out to several hours. But
> checkpoints keep timing out.
> There are no exceptions in the log. Based on the logs, both synchronous and
> asynchronous phases of checkpointing are not happening. Usually "Trigger
> checkpoint" log statement is followed by "Confirm checkpoint" when the
> checkpoint succeeds. But with 1.10, I only see "Trigger checkpoint" and no
> confirmation of completion or even indication of progress. There are enough
> cpu and memory available and there is no deadlock.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)