[jira] [Commented] (FLINK-5056) BucketingSink deletes valid data when checkpoint notification is slow.

ASF GitHub Bot (JIRA) Tue, 15 Nov 2016 13:06:30 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668314#comment-15668314
 ]


ASF GitHub Bot commented on FLINK-5056:
---------------------------------------

Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/2797
  
    No, it's not just that one method. The method is on a whole other level; it 
is so easy to edit a commit there simply is no reason for such a thing to exist 
in the *initial PR*.
    
    So, let's go through a few examples of purely cosmetic changes that could 
easily have been moved into separate commit. These changes have absolutely no 
bearing on the core of the PR, which is the port to the new state interface / 
rescaling.
    * Removing `State#hasBucketState`
    * Addition, or rather full propagation of getPendingPathFor/etc. in the 
entire class
    * all the random small formatting changes; removing iniitializations to 
null, generic parameters stuff, removing commas, adding missing spaces
    * plenty small comment fixes
    
    Note that these changes are all *good*, it was just be nice to separate 
these *cosmetic* changes from the *functional* ones.
    
    Now, let's talk about changes that are plain unnecessary and just noise:
    * the modifications to reflectTruncate
    * the loop change in snapshotState
    * no longer storing the subtaskIndex in a field
    * no longer looping over all buckets in restoreState (now 
handleRestoredState)
    
    With a properly cleaned diff i was able to determine very quickly what has 
actually changed; and this was simply not possible in the current state of the 
PR.
    
    That is was I'm referring to.



> BucketingSink deletes valid data when checkpoint notification is slow.
> ----------------------------------------------------------------------
>
>                 Key: FLINK-5056
>                 URL: https://issues.apache.org/jira/browse/FLINK-5056
>             Project: Flink
>          Issue Type: Bug
>          Components: filesystem-connector
>    Affects Versions: 1.1.3
>            Reporter: Kostas Kloudas
>            Assignee: Kostas Kloudas
>             Fix For: 1.2.0
>
>
> Currently if BucketingSink receives no data after a checkpoint and then a 
> notification about a previous checkpoint arrives, it clears its state. This 
> can 
> lead to not committing valid data about intermediate checkpoints for whom
> a notification has not arrived yet. As a simple sequence that illustrates the 
> problem:
> -> input data 
> -> snapshot(0) 
> -> input data
> -> snapshot(1)
> -> no data
> -> notifyCheckpointComplete(0)
> the last will clear the state of the Sink without committing as final the 
> data 
> that arrived for checkpoint 1.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-5056) BucketingSink deletes valid data when checkpoint notification is slow.

Reply via email to