Paweł Kaczmarczyk created BEAM-2467:
---------------------------------------
Summary: KinesisIO watermark based on approximateArrivalTimestamp
Key: BEAM-2467
URL: https://issues.apache.org/jira/browse/BEAM-2467
Project: Beam
Issue Type: Improvement
Components: sdk-java-extensions
Reporter: Paweł Kaczmarczyk
Assignee: Davor Bonaci
In Kinesis we can start reading the stream at some point in the past during the
retention period (up to 7 days). With current approach for setting record's
timestamp and watermark (both are always set to current time, i.e.
Instant.now()), we can't observe the actual position in the stream.
So the idea is to change this behaviour and set the record timestamp based on
the
[ApproximateArrivalTimestamp|http://docs.aws.amazon.com/kinesis/latest/APIReference/API_Record.html#Streams-Type-Record-ApproximateArrivalTimestamp].
Watermark will be set accordingly to the last read record's timestamp.
ApproximateArrivalTimestamp is still some approximation and may result in
having records with out-of-order timestamp's which in turn may result in some
events marked as late. This however should not be a frequent issue and even if
it happens it should be a matter of milliseconds or seconds so can be handled
even with a tiny allowedLateness setting
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)