Till Rohrmann created FLINK-6526: ------------------------------------ Summary: BlobStore files might become orphans in case of recovery Key: FLINK-6526 URL: https://issues.apache.org/jira/browse/FLINK-6526 Project: Flink Issue Type: Bug Components: Distributed Coordination Affects Versions: 1.3.0, 1.4.0 Reporter: Till Rohrmann
The {{BlobStore}} is used to store {{BlobServer}} files persistently if HA is enabled. The {{BlobLibraryCacheManager}} is responsible for keeping track of a reference count for each file. Once the count is {{0}} the {{BlobLibraryCacheManager}} will eventually delete this file from the {{BlobServer}} and also the {{BlobStore}}. In case of recovery, the {{BlobLibraryCacheManager}} will only recover those files which are actively asked for (e.g. jar files of new job submission or job recovery). All other files which might have had a reference count of {{0}} and were supposed to be eventually deleted, won't be reregistered on the {{BlobLibraryCacheManager}}. Consequently, these files will never be deleted and remain on the BlobStore for all eternity. I think upon recovery, all files currently being held in the {{BlobStore}} should be re-registered with the {{BlobLibraryCacheManager}} such that they will be eventually deleted once they timed out with a reference count of {{0}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)