[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

ASF GitHub Bot (JIRA) Tue, 27 Jan 2015 08:02:32 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293723#comment-14293723
 ]


ASF GitHub Bot commented on FLINK-1419:
---------------------------------------

Github user zentol commented on the pull request:

    https://github.com/apache/flink/pull/339#issuecomment-71672497
  
    Whenever I look more closely at the DC I'm always left wondering how it can 
work at all.
    
    About your first point, i don't think thats enough. there is a more 
fundamental flaw, we need another counter for delete processes.
    
    consider the following 2 scenarios with 2 tasks distributing the same file.
    C denotes the creating of a copying process, D denotes deleting process. # 
denotes the count variable, O the oldCount variable.
    
    ```
    1):   I   II  III  IV
    T1:---C--------D--------
    T2:-------C--------D---
    #     1   2    2   2
    O              2   2
    
    2)    I   II  III  IV
    T1:---C------------D---
    T2:-------C----D--------
    #     1   2    2   2
    O              2   2
    ```
    
    In both scenarios, D at III should not delete the file, but all D's have 
the very same information.
    
    instead, i propose having 2 counters, one counting the # of copy 
operations; and one counting the # of delete operations, with the current value 
(at process creation) stored in the process. when executing, if the current 
value is equal to the copy count, files may be deleted, since this means that 
this delete process was the last to be started.
    
    let's make another fancy schema to illustrate the point:
    ```
    1):   I   II  III  IV
    T1:---C--------D--------
    T2:-------C--------D---
    #     1   2    2   2
    O              1   2
    
    2)    I   II  III  IV
    T1:---C------------D---
    T2:-------C----D--------
    #     1   2    2   2
    O              1   2
    ```


> DistributedCache doesn't preserver files for subsequent operations
> ------------------------------------------------------------------
>
>                 Key: FLINK-1419
>                 URL: https://issues.apache.org/jira/browse/FLINK-1419
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 0.8, 0.9
>            Reporter: Chesnay Schepler
>            Assignee: Chesnay Schepler
>
> When subsequent operations want to access the same files in the DC it 
> frequently happens that the files are not created for the following operation.
> This is fairly odd, since the DC is supposed to either a) preserve files when 
> another operation kicks in within a certain time window, or b) just recreate 
> the deleted files. Both things don't happen.
> Increasing the time window had no effect.
> I'd like to use this issue as a starting point for a more general discussion 
> about the DistributedCache. 
> Currently:
> 1. all files reside in a common job-specific directory
> 2. are deleted during the job.
>  
> One thing that was brought up about Trait 1 is that it basically forbids 
> modification of the files, concurrent access and all. Personally I'm not sure 
> if this a problem. Changing it to a task-specific place solved the issue 
> though.
> I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
> is realized with the scheduler, which adds a lot of complexity to the current 
> code. (It really is a pain to work on...) 
> If we moved the deletion to the end of the job it could be done as a clean-up 
> step in the TaskManager, With this we could reduce the DC to a 
> cacheFile(String source) method, the delete method in the TM, and throw out 
> everything else.
> Also, the current implementation implies that big files may be copied 
> multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

Reply via email to