Thanks for checking it out! I appreciate the feedback, here and in the doc.

I've filed a tracking ticket at BEAM-498 [1]. Let's continue on PR #758 [2]
which takes the first step.

[1] https://issues.apache.org/jira/browse/BEAM-498
[2] https://github.com/apache/incubator-beam/pull/758

On Thu, Jul 28, 2016 at 2:22 AM, Aparup Banerjee (apbanerj) <
[email protected]> wrote:

> +1, Nice presentation
>
>
>
>
> On 7/28/16, 2:17 AM, "Amit Sela" <[email protected]> wrote:
>
> >+1
> >
> >A great way to avoid "noise" in the code, and as usual, an
> >extremely thorough presentation & documentation.
> >I'll take a look at the (initial) PR and also add the Spark section in the
> >document soon.
> >
> >Thanks!
> >
> >On Thu, Jul 28, 2016 at 9:53 AM Aljoscha Krettek <[email protected]>
> >wrote:
> >
> >> +1
> >>
> >> At first I liked the API but was skeptical because I though that this
> would
> >> require reflective invocation. Then I read on and saw that code
> generation
> >> is used and was convinced. :-)
> >>
> >> I especially like how it both cleans up the API and allows more
> >> optimizations in the future, especially with side inputs and the
> different
> >> methods for emitting.
> >>
> >> On Wed, 27 Jul 2016 at 06:49 Jean-Baptiste Onofré <[email protected]>
> wrote:
> >>
> >> >
> >> >
> >> > +1
> >> > I like the proposal and great description.
> >> > ThanksRegards JB
> >> >
> >> > -------- Original message --------
> >> > From: Kenneth Knowles <[email protected]>
> >> > Date: 27/07/2016  05:29  (GMT+01:00)
> >> > To: [email protected]
> >> > Subject: [PROPOSAL] A brand new DoFn
> >> >
> >> > Hi all,
> >> >
> >> > I have a major new feature to propose: the next generation of DoFn.
> >> >
> >> > It sounds a bit grandiose, but I think it is the best way to
> understand
> >> the
> >> > proposal.
> >> >
> >> > This is strongly motivated by the design for state and timers, aka
> >> "per-key
> >> > workflows". Since the two features are separable and have separate
> design
> >> > docs, I have started a separate thread for each.
> >> >
> >> > To get a quick overview of the proposal for a new DoFn, and how it
> >> improves
> >> > upon the flexibility and validation of DoFn, browse this presentation:
> >> >
> >> >   https://s.apache.org/presenting-a-new-dofn
> >> >
> >> > Due to the extent of this proposal, Ben & I have also prepared an
> >> in-depth
> >> > document at https://s.apache.org/a-new-dofn with additional details.
> >> > Please
> >> > comment on particulars there, or just reply to this email.
> >> >
> >> > The remainder of this email is yet another summary of the proposal, to
> >> > entice you to read the documents above and respond with a "+1".
> >> >
> >> > This is a feature that has been an experimental feature of the Java
> SDK
> >> for
> >> > some time, under the name DoFnWithContext. For the purposes of this
> email
> >> > and the linked documents, I will call it NewDoFn and I will call the
> >> status
> >> > quo OldDoFn.
> >> >
> >> > The differences between NewDoFn and and OldDoFn are most easily
> >> understood
> >> > with a quick code snippet:
> >> >
> >> >     new OldDoFn<Foo, Baz>() {
> >> >       @Override
> >> >       public void processElement(ProcessContext c) { … }
> >> >     }
> >> >
> >> >     new NewDoFn<Foo, Baz>() {
> >> >       @ProcessElement   // <-- This is the only difference
> >> >       public void processElement(ProcessContext c) { … }
> >> >     }
> >> >
> >> > What changed? NewDoFn uses annotation-based dispatch instead of method
> >> > overrides. The method annotated with @ProcessElement is used to
> process
> >> > elements. It can have any name or signature, and validation is
> performed
> >> at
> >> > pipeline construction time.
> >> >
> >> > Why do this? It allows the argument list for processElement to change.
> >> This
> >> > approach gives NewDoFn many advantages, which are demonstrated at
> length
> >> in
> >> > the linked documents. Here are some highlights:
> >> >
> >> >  - Simpler backwards-compatible approaches to new features
> >> >  - Simpler composition of advanced features
> >> >  - Greater pipeline construction-time validation
> >> >  - Easier evolution of a simple anonymous DoFn into one that uses
> >> advanced
> >> > features
> >> >
> >> > Here are some abbreviated demonstrations of things that work today or
> >> could
> >> > work easily with NewDoFn but require complex interrelated designs
> without
> >> > it:
> >> >
> >> > Access the element's window:
> >> >
> >> >     new NewDoFn<Foo, Baz>() {
> >> >       @ProcessElement
> >> >       public void processElement(ProcessContext c, BoundedWindow w) {
> … }
> >> >     }
> >> >
> >> > Use persistent state:
> >> >
> >> >     new NewDoFn<Foo, Baz>() {
> >> >       @ProcessElement
> >> >       public void processElement(
> >> >           ProcessContext c,
> >> >           @StateId("cell-id") ValueState<MyStateValue> state) {
> >> >         …
> >> >       }
> >> >     }
> >> >
> >> > Set and receive timers:
> >> >
> >> >     new NewDoFn<Foo, Baz>() {
> >> >       @ProcessElement
> >> >       public void processElement(
> >> >           ProcessContext c,
> >> >           @TimerId("timer-id") Timer state) {
> >> >         …
> >> >       }
> >> >
> >> >       @OnTimer("timer-id")
> >> >       void onMyTimer(OnTimerContext) { … }
> >> >     }
> >> >
> >> > Receive a side input as a parameter:
> >> >
> >> >     new NewDoFn<Foo, Baz>() {
> >> >       @ProcessElement
> >> >       public void processElement(
> >> >           ProcessContext c,
> >> >           @SideInput Supplier<T> side) {
> >> >         …
> >> >       }
> >> >     }
> >> >
> >> > So this is what I am proposing: We should move the Beam Java SDK to
> >> > NewDoFn!
> >> >
> >> > My proposed migration plan is:
> >> >
> >> > 1. leave a git tag before anything, so users can pin to it
> >> > 2. mv DoFn OldDoFn && mv DoFnWithContext DoFn
> >> > 3. get everything working with all runners
> >> > 4. rm OldDoFn # a few weeks later
> >> >
> >> > This will affect bleeding edge users, who will need to replace
> @Override
> >> > with @ProcessElement in all their DoFns. They can also pin to a commit
> >> > prior to the change or temporarily replace DoFn with OldDoFn
> everywhere.
> >> >
> >> > I've already done step 2 in a branch at
> >> > https://github.com/kennknowles/incubator-beam/DoFnWithContext and
> >> ported a
> >> > few examples in their own commits. If you view those commits, you can
> see
> >> > how simple the migration path is.
> >> >
> >> > Please let me know what you think. It is a big change, but one that I
> >> think
> >> > yields pretty nice rewards.
> >> >
> >> > Kenn
> >> >
> >>
>

Reply via email to