Ahh, ok, I see. Yes, it looks like a bug. So, I'd propose to deprecate the old "processing time” watermark policy, which we can remove later, and create a new fixed one.
PS: It’s recommended to use "org.apache.beam.sdk.io.aws2.kinesis.KinesisIO” instead of deprecated “org.apache.beam.sdk.io.kinesis.KinesisIO” one. — Alexey > On 27 Oct 2023, at 17:42, Jan Lukavský <je...@seznam.cz> wrote: > > No, I'm referring to this [1] policy which has unexpected (and hardly > avoidable on the user-code side) data loss issues. The problem is that > assigning timestamps to elements and watermarks is completely decoupled and > unrelated, which I'd say is a bug. > > Jan > > [1] > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kinesis/KinesisIO.Read.html#withProcessingTimeWatermarkPolicy-- > > On 10/27/23 16:51, Alexey Romanenko wrote: >> Why not just to create a custom watermark policy for that? Or you mean to >> make it as a default policy? >> >> — >> Alexey >> >>> On 27 Oct 2023, at 10:25, Jan Lukavský <je...@seznam.cz> >>> <mailto:je...@seznam.cz> wrote: >>> >>> >>> Hi, >>> >>> when discussing about [1] we found out, that the issue is actually caused >>> by processing time watermarks in KinesisIO. Enabling this watermark outputs >>> watermarks based on current processing time, _but event timestamps are >>> derived from ingestion timestamp_. This can cause unbounded lateness when >>> processing backlog. I think this setup is error-prone and will likely cause >>> data loss due to dropped elements. This can be solved in two ways: >>> >>> a) deprecate processing time watermarks, or >>> >>> b) modify KinesisIO's watermark policy so that is assigns event timestamps >>> as well (the processing-time watermark policy would have to derive event >>> timestamps from processing-time). >>> >>> I'd prefer option b) , but it might be a breaking change, moreover I'm not >>> sure if I understand the purpose of processing-time watermark policy, it >>> might be essentially ill defined from the beginning, thus it might really >>> be better to remove it completely. There is also a related issue [2]. >>> >>> Any thoughts on this? >>> >>> Jan >>> >>> [1] https://github.com/apache/beam/issues/25975 >>> >>> [2] https://github.com/apache/beam/issues/28760 >>> >>