I want to highlight that this design works for definitely more runners than just Dataflow. I see two pieces of it that I want to bring onto the thread:
1. A new kind of "unbounded source" which is a periodic refresh of a bounded source, and use that as a side input. Each main input element has a window that maps to a specific refresh of the side input. 2. Distributed map side inputs: supporting very large lookup tables, but with consistency challenges. Even the part about "windmill API" probably applies to other runners So I hope the title and "Objective" section do not cause people to stop reading. Kenn On Mon, Dec 16, 2019 at 11:36 AM Mikhail Gryzykhin <mig...@google.com> wrote: > +some people explicitly > > Can you please check on the doc and comment if it looks fine? > > Thank you, > --Mikhail > > On Tue, Dec 10, 2019 at 1:43 PM Mikhail Gryzykhin <mig...@google.com> > wrote: > >> "Good news, everyone-" >> ―Farnsworth >> >> Hi everyone, >> >> Recently, I was looking into relaxing limitations on side inputs in >> Dataflow runner. As part of it, I came up with design proposal for >> standardizing slowly changing dimensions use case in Beam and relevant >> changes to add support for distributed map side inputs. >> >> Please review and comment on design doc. >> <https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg> >> [1] >> >> Thank you, >> Mikhail. >> >> ----- >> >> [1] >> https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg >> >>