Thanks for the update and summary.



On Oct 18, 2016, 20:47, at 20:47, Amit Sela <> wrote:
>I wanted to summarize here a couple of important points raised in some
>I was involved with, and while those PRs were about KafkaIO and related
>the Spark/Direct runners, some important notes were made and I think
>are worth this thread.
>The KafkaIO waits (5 seconds) before starting to read, and (10 millis)
>between advancing the reader, which is problematic for the Spark runner
>it might attempt to read (every microbatch) for a shorter period, and
>so it
>will never start.
>Raghu Angadi mentioned in this conversation
><> that originally
>wait period was set to 10 millis for both start() and advance(), but it
>changed mostly due to DirectRunner issues.
>PR#1125 <> originally
>attempted to allow runners to configure those properties via
>PipelineOptions, but Dan Halperin raised some interesting points:
>   - Readers should return as soon as they are able.
> - Runners may poll advance() in a loop for a certain period of time if
>   it returned too fast.
>   - Runners must tolerate sources that take a long time to start or
>   advance, because real systems operate that way.
>Dan suggested (and I clearly agreed) that this should be discussed
>BTW, Thomas Groh mentioned that the DirectRunner is OK now with much
>shorter wait periods, and PR#1125 now aims to re-set the 5 seconds
>wait-on-start to 10 millis.
>I think this is a good example of Dan's points here:  Kafka reader can
>clearly return much faster (millis not seconds) and the DirectRunner
>accommodates this now by reusing it's readers.
>I Hope I didn't forget/misunderstood anything (please correct me if I

Reply via email to