xzhang2sc commented on issue #32461:
URL: https://github.com/apache/beam/issues/32461#issuecomment-2354083323

   I found this 
[assumption](https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L95C10-L100C7)
 quite problematic, and the consequence of a wrong watermark is actually 
dramatic. 
   
   > This assumes Pubsub delivers the oldest (in Pubsub processing time) 
available message at least once a minute
   
   If pubsub didn't deliver an old message during the past minute, then the 
estimated watermark will be wrong. If the watermark has already progressed, 
then it means old messages don't get acked properly and they will be delivered 
repeatedly. 
   
   In summary I think there are two problems:
   1. the inaccuracy in estimated watermark results in old messages not acked.
   2. The ack message count metric doesn't align with the actual ack'ed 
messages count. The metrics seems way higher than the actual ack'ed message 
count.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to