Re: [DISCUSS] Capability Matrix revamp

Kenneth Knowles Tue, 22 Aug 2017 19:24:07 -0700

OK, so adding these good ideas to the list:

8. Plain-English summary that comes before the nitty-gritty.
9. Comment on production readiness from maintainers. Maybe testimonials are
helpful if they can be obtained?
10. Versioning of all of the above


Any more thoughts? I'll summarize in a JIRA in a bit.

Kenn

On Tue, Aug 22, 2017 at 10:45 AM, Griselda Cuevas <[email protected]>
wrote:

> Hi, I'd also like to ask if versioning as proposed in BEAM-166 <
> https://issues.apache.org/jira/browse/BEAM-166> is still relevant? If it
> is, would this be something we want to add to this proposal?
>
> G
>
> On 21 August 2017 at 08:31, Tyler Akidau <[email protected]>
> wrote:
>
> > Is there any way we could add quantitative runner metrics to this as
> well?
> > Like by having some benchmarks that process X amount of data, and then
> > detailing in the matrix latency, throughput, and (where possible) cost,
> > etc, numbers for each of the given runners? Semantic support is one
> thing,
> > but there are other differences between runners that aren't captured by
> > just checking feature boxes. I'd be curious if anyone has other ideas in
> > this vein as well. The benchmark idea might not be the best way to go
> about
> > it.
> >
> > -Tyler
> >
> > On Sun, Aug 20, 2017 at 9:43 AM Jesse Anderson <
> [email protected]>
> > wrote:
> >
> > > It'd be awesome to see these updated. I'd add two more:
> > >
> > >    1. A plain English summary of the runner's support in Beam. People
> who
> > >    are new to Beam won't understand the in-depth coverage and need a
> > > general
> > >    idea of how it is supported.
> > >    2. The production readiness of the runner. Does the maintainer think
> > >    this runner is production ready?
> > >
> > >
> > >
> > > On Sun, Aug 20, 2017 at 8:03 AM Kenneth Knowles <[email protected]
> >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I want to revamp
> > > > https://beam.apache.org/documentation/runners/capability-matrix/
> > > >
> > > > When Beam first started, we didn't work on feature branches for the
> > core
> > > > runners, and they had a lot more gaps compared to what goes on
> `master`
> > > > today, so this tracked our progress in a way that was easy for users
> to
> > > > read. Now it is still our best/only comparison page for users, but I
> > > think
> > > > we could improve its usefulness.
> > > >
> > > > For the benefit of the thread, let me inline all the capabilities
> fully
> > > > here:
> > > >
> > > > ========================
> > > >
> > > > "What is being computed?"
> > > >  - ParDo
> > > >  - GroupByKey
> > > >  - Flatten
> > > >  - Combine
> > > >  - Composite Transforms
> > > >  - Side Inputs
> > > >  - Source API
> > > >  - Splittable DoFn
> > > >  - Metrics
> > > >  - Stateful Processing
> > > >
> > > > "Where in event time?"
> > > >  - Global windows
> > > >  - Fixed windows
> > > >  - Sliding windows
> > > >  - Session windows
> > > >  - Custom windows
> > > >  - Custom merging windows
> > > >  - Timestamp control
> > > >
> > > > "When in processing time?"
> > > >  - Configurable triggering
> > > >  - Event-time triggers
> > > >  - Processing-time triggers
> > > >  - Count triggers
> > > >  - [Meta]data driven triggers
> > > >  - Composite triggers
> > > >  - Allowed lateness
> > > >  - Timers
> > > >
> > > > "How do refinements relate?"
> > > >  - Discarding
> > > >  - Accumulating
> > > >  - Accumulating & Retracting
> > > >
> > > > ========================
> > > >
> > > > Here are some issues I'd like to improve:
> > > >
> > > >  - Rows that are impossible to not support (ParDo)
> > > >  - Rows where "support" doesn't really make sense (Composite
> > transforms)
> > > >  - Rows are actually the same model feature (non-merging windowfns)
> > > >  - Rows that represent optimizations (Combine)
> > > >  - Rows in the wrong place (Timers)
> > > >  - Rows have not been designed ([Meta]Data driven triggers)
> > > >  - Rows with names that appear no where else (Timestamp control)
> > > >  - No place to compare non-model differences between runners
> > > >
> > > > I'm still pondering how to improve this, but I thought I'd send the
> > > notion
> > > > out for discussion. Some imperfect ideas I've had:
> > > >
> > > > 1. Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into
> one
> > > row
> > > > 2. Make sections as users see them, like "ParDo" / "side Inputs" not
> > > > "What?" / "side inputs"
> > > > 3. Add rows for non-model things, like portability framework support,
> > > > metrics backends, etc
> > > > 4. Drop rows that are not informative, like Composite transforms, or
> > not
> > > > designed
> > > > 5. Reorganize the windowing section to be just support for merging /
> > > > non-merging windowing.
> > > > 6. Switch to a more distinct color scheme than the solid vs faded
> > colors
> > > > currently used.
> > > > 7. Find a web design to get short descriptions into the foreground to
> > > make
> > > > it easier to grok.
> > > >
> > > > These are just a few thoughts, and not necessarily compatible with
> each
> > > > other. What do you think?
> > > >
> > > > Kenn
> > > >
> > > --
> > > Thanks,
> > >
> > > Jesse
> > >
> >
>

Re: [DISCUSS] Capability Matrix revamp

Reply via email to