dawidwys commented on pull request #17136: URL: https://github.com/apache/flink/pull/17136#issuecomment-988789338
Hey @zoltar9264, I am afraid this solution does not work. The problem is, as far as I understand the change, while creating the state handles you are creating relative paths to the top level directory of all checkpoints. So e.g. if we have a path like: `<checkpoints-dir>/<job-id>/chk-42` you make paths relative to `<checkpoints-dir>/<job-id>` (or `<checkpoints-dir>`). However when restoring we use all relative paths as relative to the checkpoint's metadata. In case of restoring from `chk-42` it would be `<checkpoints-dir>/<job-id>`. There is one more caveat that checkpoints might share files with checkpoints from previous runs. So e.g. `checkpoints-2/job-2/chk-43` might depend on files in a directory `checkpoints-1/job-1`. Files in the first directory would have different base path than files in the other one. IMO, as long as a snapshot is not self contained it is very fishy to be relocatable. It is really hard to know which files need to be relocated as they might be scattered across different places. I think most of the problems might be solved with incremental savepoints we would like to introduce in a near future. Those would be self contained and relocatable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
