Hi,

I'm working on the Apex runner (
https://github.com/apache/incubator-beam/pull/540) and based on the
integration test results my next target is support for PCollectionView.

I looked at the side inputs doc (
https://s.apache.org/beam-side-inputs-1-pager) and see that a suggested
implementation approach is RPC.

Apex is a streaming engine where individual records flow through the
pipeline and operators process data once it becomes available. Hence I'm
also looking at side inputs as a stream vs. a call to fetch a specific
record. But that would also require a ParDo operator to hold on to the side
input state until it is no longer needed (based on expiry of the window)?

I would appreciate your thoughts on this. Is there a good streaming based
implementation to look at for reference? Also, any suggestions to break the
support for side inputs into multiple tasks that can be taken up
independently?

Thanks!
Thomas

Reply via email to