[ 
https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=258194&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-258194
 ]

ASF GitHub Bot logged work on BEAM-7428:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Jun/19 22:56
            Start Date: 11/Jun/19 22:56
    Worklog Time Spent: 10m 
      Work Description: lukecwik commented on issue #8741: [BEAM-7428] Output 
the timestamp on elements in ReadAllViaFileBasedSource
URL: https://github.com/apache/beam/pull/8741#issuecomment-501053979
 
 
   Yes, I'm talking about ProcessContext.updateWatermark().
   
   I believe you should be able to output backwards in time and depending on
   what the watermark is, the data can be considered late and possibly dropped
   (alternative suggested in https://issues.apache.org/jira/browse/BEAM-644).
   Since the SDF can control the watermark, the SDF could hold the watermark
   at -inf and output at arbitrary timestamps. Note that this isn't practical
   in streaming pipelines with an unbounded input PCollection unless the
   timestamps the BoundedSource is reporting is correlated to the watermark.
   So I believe that:
   Your suggestion to have getCurrentTimestamp return "unknown" makes a lot of
   sense and that we should always use the timestamp of the BoundedSource and
   fallback to the element timestamp if it is unknown.
   
   But what should we be updating the watermark to?
   * smallest timestamp in the bounded source?
   * largest timestamp in the bounded source?
   * ...?
   
   
   On Tue, Jun 11, 2019 at 2:44 PM Eugene Kirpichov <[email protected]>
   wrote:
   
   > I'm not sure I follow. Sounds like you're talking about
   > ProcessContext.updateWatermark()? It is true that this API is only
   > implemented for SDFs (even though technically we could support it for
   > regular DoFns too), but I don't see how it answers the question of this PR,
   > which timestamp should we output with. That API still does not make
   > outputting backwards in time be permitted.
   >
   > Perhaps the correct thing to do is to make
   > BoundedSource.getCurrentTimestamp be @Nullable, where null means "I don't
   > know the timestamp". Then if it's null then use the source's timestamp,
   > else use the provided timestamp. That seems better than the current
   > treatment, where BoundedSource by default pretends that all its elements
   > are actually infinitely old.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/beam/pull/8741?email_source=notifications&email_token=ACM4V3ER3NDE5HBMNOFMUH3P2AMCLA5CNFSM4HSDI6OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXOS7AY#issuecomment-501034883>,
   > or mute the thread
   > 
<https://github.com/notifications/unsubscribe-auth/ACM4V3FNB7PJIAGXNZ2QFILP2AMCLANCNFSM4HSDI6OA>
   > .
   >
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 258194)
    Time Spent: 4h 50m  (was: 4h 40m)

> ReadAllViaFileBasedSource does not output the timestamps of the read elements
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-7428
>                 URL: https://issues.apache.org/jira/browse/BEAM-7428
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Ismaël Mejía
>            Assignee: Ismaël Mejía
>            Priority: Minor
>          Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> This differs from the implementation of JavaReadViaImpulse that tackles a 
> similar problem but does output the timestamps correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to