psolomin commented on issue #25975:
URL: https://github.com/apache/beam/issues/25975#issuecomment-1780742092

   @je-ik 
   
   > Processing time watermark will move watermark significantly on restore.
   
   I tried to drop it, but left the window the simple one:
   
   >                 .apply("Fixed windows", 
Window.<KinesisRecord>into(FixedWindows.of(Duration.standardSeconds(60))));
   
   This actually made all records coming through, even when I reduced 
parallelism.
   
   > if you use the default watermarking policy and set allowed lateness as 
suggests [this comment]
   
   This comment is hard to find - it's not in the `KinesisIO` javadoc.
   
   Overall, I am feeling very confused cause it took me multiple trials to 
understand what combos of watermark policy and windowing are no-dataloss ones. 
When developing a pipeline, I do `.withProcessingTimeWatermarkPolicy()` to 
express "I do not care when records were produced, I just want to pull them and 
put into windows upon their time of arrival into my app". I think there's a 
need to do certain IO changes, I see these options:
   
   1. Add a warning to `withProcessingTimeWatermarkPolicy` javadoc saying that 
it causes data loss unless one adds custom processing time timestamps to the 
records
   2. Make the IO sending records with `Instance.now()` timestamps if 
`withProcessingTimeWatermarkPolicy` is set (let's call it KafkaIO-way)
   3. Deprecate `withProcessingTimeWatermarkPolicy` due to potential data losses
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to