Edward Rojas created FLINK-11337:
------------------------------------
Summary: Incorrect watermark in StreaminFileSink
BucketAssigner.Context when used in connected stream
Key: FLINK-11337
URL: https://issues.apache.org/jira/browse/FLINK-11337
Project: Flink
Issue Type: Bug
Components: filesystem-connector
Affects Versions: 1.7.0
Reporter: Edward Rojas
When StreamingFileSink is used as sink of a connected stream the "invoke"
method of the sink could be called before the "combinedWatermark" is updated
with the timestamp of the element currently being processed, resulting on the
context containing the incorrect watermark value (the Long.MIN_VALUE when using
"AssignerWithPeriodicWatermarks" for the firsts events in the stream).
I reproduce this when using a broadcast stream connected to a data stream. The
broadcast stream is using a custom timestamp extractor that always return the
Watermark.MAX_VALUE as it's done in a trining example here:
[https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/solutions/datastream_java/broadcast/OngoingRidesSolution.java#L143.]
This is problematic as the watermark could not be used reliably to compute the
bucket id based on event time.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)