[ 
https://issues.apache.org/jira/browse/BEAM-7240?focusedWorklogId=243346&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243346
 ]

ASF GitHub Bot logged work on BEAM-7240:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/May/19 13:57
            Start Date: 16/May/19 13:57
    Worklog Time Spent: 10m 
      Work Description: ajothomas commented on issue #8513: [BEAM-7240] Kinesis 
IO Watermark Computation Improvements
URL: https://github.com/apache/beam/pull/8513#issuecomment-493078621
 
 
   Hey @aromanenko-dev, thanks for your feedback. I have addressed your 
comments. Apologies for rebasing the master on to the feature branch. I have 
cleaned it up by rebasing.
   
   As for testing, I have used the IO to write pipelines and tested it out 
through a Flink runner. The pipeline consumes data from a high traffic Kinesis 
stream (approx. 100 records/sec being pushed to the stream) and uses the 
default watermark policy which takes the record arrival time of the record as 
the event time. It works fine and watermarks seem to advance as 
expected(checked watermarks in Flink's UI and it keeps getting updated to 
`Instant.now()`).
   
   For low traffic streams, I also wrote a pipeline to consume from a low 
traffic stream where I push records to the stream at intermittent intervals. 
With the default watermark policy, the watermark advances to `Instant.now()` 
whenever I push a new record. Whenever there is a period when no record was 
pushed, the watermark still advances to `Instant.now() - 2 mins`. The default 
watermark lag threshold defined in the IO is 2 mins and if there are no records 
pushed in that period, the watermark still advances. This is the default 
behavior and the threshold value is configurable.
   
   Hope I was clear. Please let me know your thoughts.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 243346)
    Time Spent: 50m  (was: 40m)

> Kinesis IO Watermark Computation Improvements
> ---------------------------------------------
>
>                 Key: BEAM-7240
>                 URL: https://issues.apache.org/jira/browse/BEAM-7240
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-kinesis
>            Reporter: Ajo Thomas
>            Assignee: Ajo Thomas
>            Priority: Minor
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, watermarks in kinesis IO are computed taking into account the 
> record arrival time in a {{KinesisRecord}}. The arrival time might not always 
> be the right representation of the event time. The user of the IO should be 
> able to specify how they want to extract the event time from the 
> KinesisRecord. 
> As the per current logic, the end user of the IO cannot control watermark 
> computation in any way. A user should be able to control watermark 
> computation through some custom heuristics or configurable params like time 
> duration to advance the watermark if no data was received (could be due to a 
> shard getting stalled.  The watermark should advance and not be stalled in 
> that case).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to