Amit Sela created BEAM-1294:
-------------------------------
Summary: Long running UnboundedSource Readers via Broadcasts
Key: BEAM-1294
URL: https://issues.apache.org/jira/browse/BEAM-1294
Project: Beam
Issue Type: Improvement
Components: runner-spark
Reporter: Amit Sela
Assignee: Amit Sela
When reading from an UnboundedSource, current implementation will cause each
split to create a new Reader every micro-batch.
As long as the overhead of creating a reader is relatively low, it's reasonable
(though I'd still be happy to get rid of), but in cases where the creation
overhead is large it becomes unreasonable forcing large batches.
One way to solve this could be to create a pool of lazy-init readers to serve
each executor, maybe via Broadcast variables.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)