Is there any way we could add quantitative runner metrics to this as well?
Like by having some benchmarks that process X amount of data, and then
detailing in the matrix latency, throughput, and (where possible) cost,
etc, numbers for each of the given runners? Semantic support is one thing,
but there are other differences between runners that aren't captured by
just checking feature boxes. I'd be curious if anyone has other ideas in
this vein as well. The benchmark idea might not be the best way to go about
it.

-Tyler

On Sun, Aug 20, 2017 at 9:43 AM Jesse Anderson <je...@bigdatainstitute.io>
wrote:

> It'd be awesome to see these updated. I'd add two more:
>
>    1. A plain English summary of the runner's support in Beam. People who
>    are new to Beam won't understand the in-depth coverage and need a
> general
>    idea of how it is supported.
>    2. The production readiness of the runner. Does the maintainer think
>    this runner is production ready?
>
>
>
> On Sun, Aug 20, 2017 at 8:03 AM Kenneth Knowles <k...@google.com.invalid>
> wrote:
>
> > Hi all,
> >
> > I want to revamp
> > https://beam.apache.org/documentation/runners/capability-matrix/
> >
> > When Beam first started, we didn't work on feature branches for the core
> > runners, and they had a lot more gaps compared to what goes on `master`
> > today, so this tracked our progress in a way that was easy for users to
> > read. Now it is still our best/only comparison page for users, but I
> think
> > we could improve its usefulness.
> >
> > For the benefit of the thread, let me inline all the capabilities fully
> > here:
> >
> > ========================
> >
> > "What is being computed?"
> >  - ParDo
> >  - GroupByKey
> >  - Flatten
> >  - Combine
> >  - Composite Transforms
> >  - Side Inputs
> >  - Source API
> >  - Splittable DoFn
> >  - Metrics
> >  - Stateful Processing
> >
> > "Where in event time?"
> >  - Global windows
> >  - Fixed windows
> >  - Sliding windows
> >  - Session windows
> >  - Custom windows
> >  - Custom merging windows
> >  - Timestamp control
> >
> > "When in processing time?"
> >  - Configurable triggering
> >  - Event-time triggers
> >  - Processing-time triggers
> >  - Count triggers
> >  - [Meta]data driven triggers
> >  - Composite triggers
> >  - Allowed lateness
> >  - Timers
> >
> > "How do refinements relate?"
> >  - Discarding
> >  - Accumulating
> >  - Accumulating & Retracting
> >
> > ========================
> >
> > Here are some issues I'd like to improve:
> >
> >  - Rows that are impossible to not support (ParDo)
> >  - Rows where "support" doesn't really make sense (Composite transforms)
> >  - Rows are actually the same model feature (non-merging windowfns)
> >  - Rows that represent optimizations (Combine)
> >  - Rows in the wrong place (Timers)
> >  - Rows have not been designed ([Meta]Data driven triggers)
> >  - Rows with names that appear no where else (Timestamp control)
> >  - No place to compare non-model differences between runners
> >
> > I'm still pondering how to improve this, but I thought I'd send the
> notion
> > out for discussion. Some imperfect ideas I've had:
> >
> > 1. Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one
> row
> > 2. Make sections as users see them, like "ParDo" / "side Inputs" not
> > "What?" / "side inputs"
> > 3. Add rows for non-model things, like portability framework support,
> > metrics backends, etc
> > 4. Drop rows that are not informative, like Composite transforms, or not
> > designed
> > 5. Reorganize the windowing section to be just support for merging /
> > non-merging windowing.
> > 6. Switch to a more distinct color scheme than the solid vs faded colors
> > currently used.
> > 7. Find a web design to get short descriptions into the foreground to
> make
> > it easier to grok.
> >
> > These are just a few thoughts, and not necessarily compatible with each
> > other. What do you think?
> >
> > Kenn
> >
> --
> Thanks,
>
> Jesse
>

Reply via email to