Hi, you're right that this should actually happen automatically. The delete operation is executed by an asynchronous thread and, thus, can happen a bit later after discarding the actual checkpoint. What we have seen in the past is that if you use for example S3, it could happen that the write and delete operations were throttled. This caused that the delete operations where piling up, but were still taking place eventually. Therefore, it would be helpful to know to which file system you checkpoint the state. Moreover, is it the case that the checkpoint files are never deleted or only slowly?
For further debugging purposes it would be really helpful to get the log files of the JobManager on DEBUG log level. Cheers, Till On Thu, Sep 7, 2017 at 7:25 PM, rnosworthy < [email protected]> wrote: > Flink 1.3.2 > FileState Backend > Currently have 1 Job Manager with 1 Task Manager > > I believe this should happen automatically, however there are hundreds of > checkpoint files building up in my data directory. > > I have tried numerous attempts to clean up the checkpoint data via setting > fileStateSizeThreshold when instantiating FsStateBackend object for > environment. > > I have also tried to set config option 'state.checkpoints.num-retained: 5' > > Is there something I am doing wrong or is this a potential bug in 1.3.2? > > Checkpoint Config : > Option: Value > Checkpointing Mode: Exactly Once > Interval: 30s > Timeout: 10m 0s > Minimum Pause Between Checkpoints: 0ms > Maximum Concurrent Checkpoints: 1 > Persist Checkpoints Externally Disabled > > > > -- > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ >
