Hi, I'm working on the Apex runner ( https://github.com/apache/incubator-beam/pull/540) and based on the integration test results my next target is support for PCollectionView.
I looked at the side inputs doc ( https://s.apache.org/beam-side-inputs-1-pager) and see that a suggested implementation approach is RPC. Apex is a streaming engine where individual records flow through the pipeline and operators process data once it becomes available. Hence I'm also looking at side inputs as a stream vs. a call to fetch a specific record. But that would also require a ParDo operator to hold on to the side input state until it is no longer needed (based on expiry of the window)? I would appreciate your thoughts on this. Is there a good streaming based implementation to look at for reference? Also, any suggestions to break the support for side inputs into multiple tasks that can be taken up independently? Thanks! Thomas
