Amit Sela created BEAM-1294:
-------------------------------

             Summary: Long running UnboundedSource Readers via Broadcasts
                 Key: BEAM-1294
                 URL: https://issues.apache.org/jira/browse/BEAM-1294
             Project: Beam
          Issue Type: Improvement
          Components: runner-spark
            Reporter: Amit Sela
            Assignee: Amit Sela


When reading from an UnboundedSource, current implementation will cause each 
split to create a new Reader every micro-batch.

As long as the overhead of creating a reader is relatively low, it's reasonable 
(though I'd still be happy to get rid of), but in cases where the creation 
overhead is large it becomes unreasonable forcing large batches.

One way to solve this could be to create a pool of lazy-init readers to serve 
each executor, maybe via Broadcast variables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to