On Thu, Aug 25, 2016 at 1:13 PM, Thomas Groh <[email protected]> wrote:
> > We should still have a JIRA to improve the KafkaIO watermark tracking in > the absence of new records . > filed https://issues.apache.org/jira/browse/BEAM-591 I don't want to hijack this thread Sumit's primary issue, but want to mention related concerns here, which could be discussed on a new thread or on the jira: from the jira description : A user can provide functions to calculate watermark and record timestamps. There are a few concerns with current design: - What should happen when a kafka topic is idle: - in default case, I think watermark should advance to current time. - What should happen when user has provided a function to calculate record timestamp? - Should the watermark stay same as record timestamp? - same when user has provided own watermark function? - Are the current semantics of user provided watermark function correct? - it is run once for each record read. - Should it instead be run inside getWatermark() called by the runner (we could still provide the last user record, and its timestamp).
