On Fri, Oct 26, 2018 at 6:47 PM Kenneth Knowles <[email protected]> wrote: > > It all sounds very useful but I have basic concerns about item 1. The doc > doesn't really seem to go into the design concerns that I have in mind. > > - map / flatMap are universal functions with definitions that we don't own > and shouldn't violate > - corollary: map / flatMap have per element parallelism with no dependencies > between them > - the doc says "map is just implemented as DoFn.process()" which is > implementation, not the spec
That's an interesting perspective. I had always thought of Map/FlatMap as convenient ways of defining a ParDo from a lambda without the baggage of having to provide a full DoFn class, similar to MapElements in Java, but your interpretation is that the names themselves imply no inter-element interaction (e.g. via state) is allowed. > So suggestions: > > - how about just give it a new name making it clear it is not map nor > flatMap and does not have per-element parallelism > - spec out the functionality without reference to DoFn > - be explicit about what determines the maximum parallelism / which elements > are required to be processed serially (generally key+window) These probably don't merit a new (pair of) named operation(s). The motivation to add them was "why can I use state in a DoFn, but not in Map/FlatMap?" which could be justified by the above. As for the side input question, definitely (A). > On Fri, Oct 26, 2018 at 2:49 AM Robert Bradshaw <[email protected]> wrote: >> >> Thanks. They make sense to me (and would have been handy when I was >> writing the state tests for the Fn API). >> On Fri, Oct 26, 2018 at 10:48 AM Charles Chen <[email protected]> wrote: >> > >> > Hey there, >> > >> > A while back, I shared the Beam Python State and Timers API proposal >> > (https://s.apache.org/beam-python-user-state-and-timers) with this list >> > [1]; we reached consensus on the features proposed there and I implemented >> > the API surface described there, along with the reference DirectRunner >> > implementation and some shared code for executing such pipelines (see for >> > example https://github.com/apache/beam/pull/5691 and >> > https://github.com/apache/beam/pull/6304). >> > >> > I would like to propose some additional design considerations to improve >> > upon the previous State / Timer API proposal as described here (on pages >> > 14-18 that I have appended to the doc): >> > https://docs.google.com/document/d/1GadEkAmtbJQjmqiqfSzGw3b66TKerm8tyn6TK4blAys/edit#heading=h.10nb33sz7u16 >> > >> > Briefly, these are the additional design considerations proposed to >> > improve upon the API: >> > >> > Allowing access to user state / timers in Map / FlatMap callables / lambdas >> > Allowing access to the key during the timer callback >> > Allowing access to auxiliary timer data >> > Allowing access to side inputs during the timer callback >> > >> > I would really appreciate any feedback you may have. Thanks! >> > >> > Best, >> > Charles >> > >> > >> > [1] Previous dev@ discussion thread: >> > https://lists.apache.org/thread.html/51ba1a00027ad8635bc1d2c0df805ce873995170c75d6a08dfe21997@%3Cdev.beam.apache.org%3E
