On Thu, Aug 25, 2016 at 1:13 PM, Thomas Groh <[email protected]>
wrote:

>
> We should still have a JIRA to improve the KafkaIO watermark tracking in
> the absence of new records .
>

filed https://issues.apache.org/jira/browse/BEAM-591

I don't want to hijack this thread Sumit's primary issue, but want to
mention related concerns here, which could be discussed on a new thread or
on the jira:
from the jira description :

A user can provide functions to calculate watermark and record timestamps.
There are a few concerns with current design:

   - What should happen when a kafka topic is idle:
      - in default case, I think watermark should advance to current time.
      - What should happen when user has provided a function to calculate
      record timestamp?
         - Should the watermark stay same as record timestamp?
         - same when user has provided own watermark function?
      - Are the current semantics of user provided watermark function
   correct?
      - it is run once for each record read.
      - Should it instead be run inside getWatermark() called by the runner
      (we could still provide the last user record, and its timestamp).

Reply via email to