[
https://issues.apache.org/jira/browse/FLINK-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067104#comment-16067104
]
Gustavo Anatoly commented on FLINK-6526:
----------------------------------------
Hi, Till...
I've been trying to reproduce this bug, with this gist:
[https://gist.github.com/gustavoanatoly/b7a29062a45201168362401346cede61]
Could you please review it?
> BlobStore files might become orphans in case of recovery
> --------------------------------------------------------
>
> Key: FLINK-6526
> URL: https://issues.apache.org/jira/browse/FLINK-6526
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination
> Affects Versions: 1.3.0, 1.4.0
> Reporter: Till Rohrmann
>
> The {{BlobStore}} is used to store {{BlobServer}} files persistently if HA is
> enabled. The {{BlobLibraryCacheManager}} is responsible for keeping track of
> a reference count for each file. Once the count is {{0}} the
> {{BlobLibraryCacheManager}} will eventually delete this file from the
> {{BlobServer}} and also the {{BlobStore}}. In case of recovery, the
> {{BlobLibraryCacheManager}} will only recover those files which are actively
> asked for (e.g. jar files of new job submission or job recovery). All other
> files which might have had a reference count of {{0}} and were supposed to be
> eventually deleted, won't be reregistered on the {{BlobLibraryCacheManager}}.
> Consequently, these files will never be deleted and remain on the BlobStore
> for all eternity.
> I think upon recovery, all files currently being held in the {{BlobStore}}
> should be re-registered with the {{BlobLibraryCacheManager}} such that they
> will be eventually deleted once they timed out with a reference count of
> {{0}}.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)