[ 
https://issues.apache.org/jira/browse/BEAM-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925017#comment-15925017
 ] 

ASF GitHub Bot commented on BEAM-1721:
--------------------------------------

GitHub user tgroh opened a pull request:

    https://github.com/apache/beam/pull/2246

    [BEAM-1721] Do not shift Timestamps forwards in Reshuffle

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
    
     - [ ] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [ ] If this contribution is large, please file an Apache
           [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
    
    ---
    Timestamps can be shifted forwards after the fact, but cannot generally
    be shifted backwards. Because reshuffle outputs "as quickly as
    possible", only elements that arrive approximately simulatenously with
    each other will have their timestamps shifted.
    
    There is currently no way to output all input elements with their
    original timestamps without explicitly reifying those timestamps and
    reassigning them on the output elements.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tgroh/beam reshuffle_output_time_fn

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/2246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2246
    
----
commit 347250b80fafecbdb12059233255e526370a1623
Author: Thomas Groh <[email protected]>
Date:   2017-03-14T21:05:44Z

    Do not shift Timestamps forwards in Reshuffle
    
    Timestamps can be shifted forwards after the fact, but cannot generally
    be shifted backwards. Because reshuffle outputs "as quickly as
    possible", only elements that arrive approximately simulatenously with
    each other will have their timestamps shifted.
    
    There is currently no way to output all input elements with their
    original timestamps without explicitly reifying those timestamps and
    reassigning them on the output elements.

----


> Reshuffle can shift elements in time
> ------------------------------------
>
>                 Key: BEAM-1721
>                 URL: https://issues.apache.org/jira/browse/BEAM-1721
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Thomas Groh
>            Assignee: Thomas Groh
>
> The reshuffle transform is meant to have no visible effects on the data that 
> it processes. However, due to the use of a {{GroupByKey}}, the timestamp of 
> the output elements is determined by the {{OutputTimeFn}} of the input 
> {{WindowingStrategy}}
> Elements should not be shifted in time when being processed in {{Reshuffle}}. 
> Currently this would require reifying all timestamps before applying the 
> GroupByKey and reapplying them after. As an intermediate solution, elements 
> should never be shifted forwards in time, as doing so permits the watermark 
> to advance improperly (if the elements already contain their timestamps, for 
> example), and prevents the timestamps from being reassigned within a {{DoFn}} 
> or via the {{WithTimestamps}} transform.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to