bezmax edited a comment on issue #6161: Cloning subscriptions
URL: https://github.com/apache/pulsar/issues/6161#issuecomment-581418529
 
 
   @yjshen Yes, that would work, however unless I'm missing something it will 
result in a lot of duplicates reprocessed. I'm ok with duplicates but in my 
case the messages might have completely different processing times (some need 
enrichment, some do not) and this would result in hundreds of thousands 
duplicates being reprocessed.
   
   For example, if message 1 needs a second long enrichment (with async 
operator), and in that time 100k messages get processed, if snapshotted before 
the first message has finished being enriched - the reader position will still 
be before it, as we can't advance the cursor durably while the message is stuck 
in front. So in case of failure - instead of processing just one message, whole 
100k will need to be processed again.
   
   Again, maybe my understanding of Flink/Pulsar is incomplete, but I feel like 
the only way to avoid this behavior is through copying the state of whole 
subscription (it's real backlog + watermark).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to