[ 
https://issues.apache.org/jira/browse/FLINK-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067104#comment-16067104
 ] 

Gustavo Anatoly commented on FLINK-6526:
----------------------------------------

Hi, Till...

I've been trying to reproduce this bug, with this gist: 
[https://gist.github.com/gustavoanatoly/b7a29062a45201168362401346cede61]
Could you please review it?

> BlobStore files might become orphans in case of recovery
> --------------------------------------------------------
>
>                 Key: FLINK-6526
>                 URL: https://issues.apache.org/jira/browse/FLINK-6526
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.3.0, 1.4.0
>            Reporter: Till Rohrmann
>
> The {{BlobStore}} is used to store {{BlobServer}} files persistently if HA is 
> enabled. The {{BlobLibraryCacheManager}} is responsible for keeping track of 
> a reference count for each file. Once the count is {{0}} the 
> {{BlobLibraryCacheManager}} will eventually delete this file from the 
> {{BlobServer}} and also the {{BlobStore}}. In case of recovery, the 
> {{BlobLibraryCacheManager}} will only recover those files which are actively 
> asked for (e.g. jar files of new job submission or job recovery). All other 
> files which might have had a reference count of {{0}} and were supposed to be 
> eventually deleted, won't be reregistered on the {{BlobLibraryCacheManager}}. 
> Consequently, these files will never be deleted and remain on the BlobStore 
> for all eternity.
> I think upon recovery, all files currently being held in the {{BlobStore}} 
> should be re-registered with the {{BlobLibraryCacheManager}} such that they 
> will be eventually deleted once they timed out with a reference count of 
> {{0}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to