+1, Nice presentation
On 7/28/16, 2:17 AM, "Amit Sela" <[email protected]> wrote: >+1 > >A great way to avoid "noise" in the code, and as usual, an >extremely thorough presentation & documentation. >I'll take a look at the (initial) PR and also add the Spark section in the >document soon. > >Thanks! > >On Thu, Jul 28, 2016 at 9:53 AM Aljoscha Krettek <[email protected]> >wrote: > >> +1 >> >> At first I liked the API but was skeptical because I though that this would >> require reflective invocation. Then I read on and saw that code generation >> is used and was convinced. :-) >> >> I especially like how it both cleans up the API and allows more >> optimizations in the future, especially with side inputs and the different >> methods for emitting. >> >> On Wed, 27 Jul 2016 at 06:49 Jean-Baptiste Onofré <[email protected]> wrote: >> >> > >> > >> > +1 >> > I like the proposal and great description. >> > ThanksRegards JB >> > >> > -------- Original message -------- >> > From: Kenneth Knowles <[email protected]> >> > Date: 27/07/2016 05:29 (GMT+01:00) >> > To: [email protected] >> > Subject: [PROPOSAL] A brand new DoFn >> > >> > Hi all, >> > >> > I have a major new feature to propose: the next generation of DoFn. >> > >> > It sounds a bit grandiose, but I think it is the best way to understand >> the >> > proposal. >> > >> > This is strongly motivated by the design for state and timers, aka >> "per-key >> > workflows". Since the two features are separable and have separate design >> > docs, I have started a separate thread for each. >> > >> > To get a quick overview of the proposal for a new DoFn, and how it >> improves >> > upon the flexibility and validation of DoFn, browse this presentation: >> > >> > https://s.apache.org/presenting-a-new-dofn >> > >> > Due to the extent of this proposal, Ben & I have also prepared an >> in-depth >> > document at https://s.apache.org/a-new-dofn with additional details. >> > Please >> > comment on particulars there, or just reply to this email. >> > >> > The remainder of this email is yet another summary of the proposal, to >> > entice you to read the documents above and respond with a "+1". >> > >> > This is a feature that has been an experimental feature of the Java SDK >> for >> > some time, under the name DoFnWithContext. For the purposes of this email >> > and the linked documents, I will call it NewDoFn and I will call the >> status >> > quo OldDoFn. >> > >> > The differences between NewDoFn and and OldDoFn are most easily >> understood >> > with a quick code snippet: >> > >> > new OldDoFn<Foo, Baz>() { >> > @Override >> > public void processElement(ProcessContext c) { … } >> > } >> > >> > new NewDoFn<Foo, Baz>() { >> > @ProcessElement // <-- This is the only difference >> > public void processElement(ProcessContext c) { … } >> > } >> > >> > What changed? NewDoFn uses annotation-based dispatch instead of method >> > overrides. The method annotated with @ProcessElement is used to process >> > elements. It can have any name or signature, and validation is performed >> at >> > pipeline construction time. >> > >> > Why do this? It allows the argument list for processElement to change. >> This >> > approach gives NewDoFn many advantages, which are demonstrated at length >> in >> > the linked documents. Here are some highlights: >> > >> > - Simpler backwards-compatible approaches to new features >> > - Simpler composition of advanced features >> > - Greater pipeline construction-time validation >> > - Easier evolution of a simple anonymous DoFn into one that uses >> advanced >> > features >> > >> > Here are some abbreviated demonstrations of things that work today or >> could >> > work easily with NewDoFn but require complex interrelated designs without >> > it: >> > >> > Access the element's window: >> > >> > new NewDoFn<Foo, Baz>() { >> > @ProcessElement >> > public void processElement(ProcessContext c, BoundedWindow w) { … } >> > } >> > >> > Use persistent state: >> > >> > new NewDoFn<Foo, Baz>() { >> > @ProcessElement >> > public void processElement( >> > ProcessContext c, >> > @StateId("cell-id") ValueState<MyStateValue> state) { >> > … >> > } >> > } >> > >> > Set and receive timers: >> > >> > new NewDoFn<Foo, Baz>() { >> > @ProcessElement >> > public void processElement( >> > ProcessContext c, >> > @TimerId("timer-id") Timer state) { >> > … >> > } >> > >> > @OnTimer("timer-id") >> > void onMyTimer(OnTimerContext) { … } >> > } >> > >> > Receive a side input as a parameter: >> > >> > new NewDoFn<Foo, Baz>() { >> > @ProcessElement >> > public void processElement( >> > ProcessContext c, >> > @SideInput Supplier<T> side) { >> > … >> > } >> > } >> > >> > So this is what I am proposing: We should move the Beam Java SDK to >> > NewDoFn! >> > >> > My proposed migration plan is: >> > >> > 1. leave a git tag before anything, so users can pin to it >> > 2. mv DoFn OldDoFn && mv DoFnWithContext DoFn >> > 3. get everything working with all runners >> > 4. rm OldDoFn # a few weeks later >> > >> > This will affect bleeding edge users, who will need to replace @Override >> > with @ProcessElement in all their DoFns. They can also pin to a commit >> > prior to the change or temporarily replace DoFn with OldDoFn everywhere. >> > >> > I've already done step 2 in a branch at >> > https://github.com/kennknowles/incubator-beam/DoFnWithContext and >> ported a >> > few examples in their own commits. If you view those commits, you can see >> > how simple the migration path is. >> > >> > Please let me know what you think. It is a big change, but one that I >> think >> > yields pretty nice rewards. >> > >> > Kenn >> > >>
