bzablocki commented on issue #32596:
URL: https://github.com/apache/beam/issues/32596#issuecomment-2407685063

   > With redeliveries I also was wondering why are they not filtered out as 
duplicates (the requiresDedup property of the IO transforms in Beam)? I checked 
it specifically that the message id is the same between the original message 
and the same message when it is being redelivered but Beam/Dataflow just sends 
it over to our pipeline twice.. so I'm a bit confused about this deduplication 
logic.
   
   I assume you set the `SolaceIO.Read#withDeduplicateRecords()` to `true`? 
   
   This adds a Reshuffle step based on an id. The id in this case is 
   
   
https://github.com/apache/beam/blob/e7ec432db7bf4d7c0b8c77a1dc5f54acab903462/sdks/java/io/solace/src/main/java/org/apache/beam/sdk/io/solace/SolaceIO.java#L560-L569
   
   Fyi, the added Reshuffle step is in the Deduplicate transform:
   
   
https://github.com/apache/beam/blob/e7ec432db7bf4d7c0b8c77a1dc5f54acab903462/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L2174-L2175
   
   Could you check if you look at the same id that is used for deduplication?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to