[ 
https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337486&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337486
 ]

ASF GitHub Bot logged work on BEAM-8347:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Nov/19 19:01
            Start Date: 01/Nov/19 19:01
    Worklog Time Spent: 10m 
      Work Description: drobert commented on issue #9820: [BEAM-8347]: 
Consistently advance UnboundedRabbitMqReader watermark
URL: https://github.com/apache/beam/pull/9820#issuecomment-548912008
 
 
   @jkff I looked at the PubSub implementation as well and opted against it 
*for now* for a few reasons:
   1. It has an open ticket for the watermark not advancing with low volumes 
(https://issues.apache.org/jira/browse/BEAM-7322) which is the issue I set out 
to solve here for rabbit
   2. It's fairly complicated relative to what I have here for rabbit. Given 
the difficulties in testing watermarking for an unbounded source, I didn't 
relish making this a blocker.
   3. Rabbit has different semantics than pubsub. I actually think the 
processing time watermark is correct and simple for rabbit.
   
   I'm very open to exploring better watermarking approaches in the future, but 
I think the current one is broken and I'd like to get a fix in first and 
improvements to it in second, if that makes sense.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 337486)
    Time Spent: 2h 20m  (was: 2h 10m)

> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-8347
>                 URL: https://issues.apache.org/jira/browse/BEAM-8347
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-rabbitmq
>    Affects Versions: 2.15.0
>         Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>            Reporter: Daniel Robert
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of no new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
>     Window.<RabbitMqMessage>into(
>       FixedWindows.of(Duration.standardSeconds(10)))
>     .triggering(AfterWatermark.pastEndOfWindow())
>     .withAllowedLateness(Duration.ZERO)
>     .accumulatingFiredPanes()){code}
> If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
> window that never performs an on time trigger.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to