psolomin commented on issue #25975:
URL: https://github.com/apache/beam/issues/25975#issuecomment-1774933051

   @je-ik 
   
   Thanks for the tip. I've enabled the logs you mentioned, and I've seen logs 
like this after reducing parallelism 3 -> 2:
   
   > LateDataFilter: Dropping element at 2023-10-23T09:47:44.305Z for key:key: 
0 shard: 0; window:[2023-10-23T09:47:00.000Z..2023-10-23T09:48:00.000Z) since 
too far behind inputWatermark:2023-10-23T09:50:03.009Z; 
outputWatermark:2023-10-23T09:50:03.009Z
   
   The other logs tell
   
   > WindowTracing - describePane: ON_TIME
   
   and they all are `ON_TIME` which contradicts `Dropping element ...` logs.
   
   >  It might be the case that watermarks are not reconstructed correctly in 
this case in KinesisIO.
   
   Is it possible this issue is related to 
https://github.com/apache/beam/issues/28760?
   
   I also tried to assign timestamps to records:
   
   ```
   public class AssignTimestampsDoFn extends DoFn<KinesisRecord, KinesisRecord> 
{
       @ProcessElement
       public void processElement(@Element KinesisRecord input, 
OutputReceiver<KinesisRecord> out) {
           out.outputWithTimestamp(input, Instant.now());
       }
   
       @Override
       public Duration getAllowedTimestampSkew() {
           return Duration.standardSeconds(5);
       }
   }
   
   ```
   
   ```
   PCollection<KinesisRecord> windowedRecords = p.apply("Source", reader)
                   .apply("Assign ts", ParDo.of(new AssignTimestampsDoFn()))
                   .apply("Fixed windows", 
Window.<KinesisRecord>into(FixedWindows.of(Duration.standardSeconds(60))));
   
   ```
   
   After this, reducing parallelism was not causing data loss anymore.
   
   What might be the next steps, in your opinion? Moving forward with fixing 
https://github.com/apache/beam/issues/28760 in the way @sjmittal proposed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to