Paweł Kaczmarczyk created BEAM-2467:
---------------------------------------

             Summary: KinesisIO watermark based on approximateArrivalTimestamp
                 Key: BEAM-2467
                 URL: https://issues.apache.org/jira/browse/BEAM-2467
             Project: Beam
          Issue Type: Improvement
          Components: sdk-java-extensions
            Reporter: Paweł Kaczmarczyk
            Assignee: Davor Bonaci


In Kinesis we can start reading the stream at some point in the past during the 
retention period (up to 7 days). With current approach for setting record's 
timestamp and watermark (both are always set to current time, i.e. 
Instant.now()), we can't observe the actual position in the stream.

So the idea is to change this behaviour and set the record timestamp based on 
the 
[ApproximateArrivalTimestamp|http://docs.aws.amazon.com/kinesis/latest/APIReference/API_Record.html#Streams-Type-Record-ApproximateArrivalTimestamp].
 Watermark will be set accordingly to the last read record's timestamp. 

ApproximateArrivalTimestamp is still some approximation and may result in 
having records with out-of-order timestamp's which in turn may result in some 
events marked as late. This however should not be a frequent issue and even if 
it happens it should be a matter of milliseconds or seconds so can be handled 
even with a tiny allowedLateness setting



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to