Janek Bevendorff created BEAM-13675:
---------------------------------------

             Summary: Python SDK not creating Flink checkpoints
                 Key: BEAM-13675
                 URL: https://issues.apache.org/jira/browse/BEAM-13675
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
    Affects Versions: 2.35.0
            Reporter: Janek Bevendorff


I have been trying to get checkpointing to work with Apache Flink and Beam for 
the Python SDK, but without any success. I read a ton of documentation on how 
to get this working, but couldn't make any progress, so I have to assume that 
this is a bug. If not, then we need to at least fix the documentation 
(fundamentally).

The "bug" is reproducible with both the PortableRunner and the FlinkRunner + 
uber JAR. I cannot really test the FlinkRunner without uber JARs, because I am 
submitting to a remote cluster.

The flink cluster is configured with:

 
{code:java}
    state.checkpoint-storage: "filesystem"
    state.checkpoints.dir: "file:///foo/bar/cp"
    state.savepoints.dir: "file:///foo/bar/sp"
    execution.checkpointing.interval: "60s"
    execution.checkpointing.externalized-checkpoint-retention: 
"DELETE_ON_CANCELLATION" {code}
({{{}/foo/bar{}}} is a shared network mount)

 

When I submit a job, all I'm seeing in the Flink Web UI under job configuration 
is

 
{noformat}
Execution mode: PIPELINED
Max. number of execution retries: Cluster level default restart strategy
Job parallelism: 120
Object reuse mode: false{noformat}
"User Configuration" is empty and no checkpoints are created (both the 
Checkpoints tab and the checkpoints folder remain empty).

 

I tried setting

 
{code:java}
checkpointing_interval=30000,
externalized_checkpoints_enabled=True, {code}
in my Beam submission config, but the result is the same.

 

When I try the FlinkRunner with {{{}flink_submit_uber_jar=True{}}}, it's the 
same again, but this time I also get the following warning and the job starts 
with a parallelism of 1 (I guess that's another bug):
{code:java}
WARNING:apache_beam.options.pipeline_options:Discarding invalid overrides: 
{'checkpointing_interval': 30000, 'externalized_
checkpoints_enabled': True, 'parallelism': 120}{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to