Re: [DISCUSS] Capability Matrix revamp

Griselda Cuevas Tue, 22 Aug 2017 10:46:33 -0700

Hi, I'd also like to ask if versioning as proposed in BEAM-166 <
https://issues.apache.org/jira/browse/BEAM-166> is still relevant? If it
is, would this be something we want to add to this proposal?


G

On 21 August 2017 at 08:31, Tyler Akidau <[email protected]> wrote:

> Is there any way we could add quantitative runner metrics to this as well?
> Like by having some benchmarks that process X amount of data, and then
> detailing in the matrix latency, throughput, and (where possible) cost,
> etc, numbers for each of the given runners? Semantic support is one thing,
> but there are other differences between runners that aren't captured by
> just checking feature boxes. I'd be curious if anyone has other ideas in
> this vein as well. The benchmark idea might not be the best way to go about
> it.
>
> -Tyler
>
> On Sun, Aug 20, 2017 at 9:43 AM Jesse Anderson <[email protected]>
> wrote:
>
> > It'd be awesome to see these updated. I'd add two more:
> >
> >    1. A plain English summary of the runner's support in Beam. People who
> >    are new to Beam won't understand the in-depth coverage and need a
> > general
> >    idea of how it is supported.
> >    2. The production readiness of the runner. Does the maintainer think
> >    this runner is production ready?
> >
> >
> >
> > On Sun, Aug 20, 2017 at 8:03 AM Kenneth Knowles <[email protected]>
> > wrote:
> >
> > > Hi all,
> > >
> > > I want to revamp
> > > https://beam.apache.org/documentation/runners/capability-matrix/
> > >
> > > When Beam first started, we didn't work on feature branches for the
> core
> > > runners, and they had a lot more gaps compared to what goes on `master`
> > > today, so this tracked our progress in a way that was easy for users to
> > > read. Now it is still our best/only comparison page for users, but I
> > think
> > > we could improve its usefulness.
> > >
> > > For the benefit of the thread, let me inline all the capabilities fully
> > > here:
> > >
> > > ========================
> > >
> > > "What is being computed?"
> > >  - ParDo
> > >  - GroupByKey
> > >  - Flatten
> > >  - Combine
> > >  - Composite Transforms
> > >  - Side Inputs
> > >  - Source API
> > >  - Splittable DoFn
> > >  - Metrics
> > >  - Stateful Processing
> > >
> > > "Where in event time?"
> > >  - Global windows
> > >  - Fixed windows
> > >  - Sliding windows
> > >  - Session windows
> > >  - Custom windows
> > >  - Custom merging windows
> > >  - Timestamp control
> > >
> > > "When in processing time?"
> > >  - Configurable triggering
> > >  - Event-time triggers
> > >  - Processing-time triggers
> > >  - Count triggers
> > >  - [Meta]data driven triggers
> > >  - Composite triggers
> > >  - Allowed lateness
> > >  - Timers
> > >
> > > "How do refinements relate?"
> > >  - Discarding
> > >  - Accumulating
> > >  - Accumulating & Retracting
> > >
> > > ========================
> > >
> > > Here are some issues I'd like to improve:
> > >
> > >  - Rows that are impossible to not support (ParDo)
> > >  - Rows where "support" doesn't really make sense (Composite
> transforms)
> > >  - Rows are actually the same model feature (non-merging windowfns)
> > >  - Rows that represent optimizations (Combine)
> > >  - Rows in the wrong place (Timers)
> > >  - Rows have not been designed ([Meta]Data driven triggers)
> > >  - Rows with names that appear no where else (Timestamp control)
> > >  - No place to compare non-model differences between runners
> > >
> > > I'm still pondering how to improve this, but I thought I'd send the
> > notion
> > > out for discussion. Some imperfect ideas I've had:
> > >
> > > 1. Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one
> > row
> > > 2. Make sections as users see them, like "ParDo" / "side Inputs" not
> > > "What?" / "side inputs"
> > > 3. Add rows for non-model things, like portability framework support,
> > > metrics backends, etc
> > > 4. Drop rows that are not informative, like Composite transforms, or
> not
> > > designed
> > > 5. Reorganize the windowing section to be just support for merging /
> > > non-merging windowing.
> > > 6. Switch to a more distinct color scheme than the solid vs faded
> colors
> > > currently used.
> > > 7. Find a web design to get short descriptions into the foreground to
> > make
> > > it easier to grok.
> > >
> > > These are just a few thoughts, and not necessarily compatible with each
> > > other. What do you think?
> > >
> > > Kenn
> > >
> > --
> > Thanks,
> >
> > Jesse
> >
>

Re: [DISCUSS] Capability Matrix revamp

Reply via email to