Till Rohrmann created FLINK-6526:
------------------------------------

             Summary: BlobStore files might become orphans in case of recovery
                 Key: FLINK-6526
                 URL: https://issues.apache.org/jira/browse/FLINK-6526
             Project: Flink
          Issue Type: Bug
          Components: Distributed Coordination
    Affects Versions: 1.3.0, 1.4.0
            Reporter: Till Rohrmann


The {{BlobStore}} is used to store {{BlobServer}} files persistently if HA is 
enabled. The {{BlobLibraryCacheManager}} is responsible for keeping track of a 
reference count for each file. Once the count is {{0}} the 
{{BlobLibraryCacheManager}} will eventually delete this file from the 
{{BlobServer}} and also the {{BlobStore}}. In case of recovery, the 
{{BlobLibraryCacheManager}} will only recover those files which are actively 
asked for (e.g. jar files of new job submission or job recovery). All other 
files which might have had a reference count of {{0}} and were supposed to be 
eventually deleted, won't be reregistered on the {{BlobLibraryCacheManager}}. 
Consequently, these files will never be deleted and remain on the BlobStore for 
all eternity.

I think upon recovery, all files currently being held in the {{BlobStore}} 
should be re-registered with the {{BlobLibraryCacheManager}} such that they 
will be eventually deleted once they timed out with a reference count of {{0}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to