OK, so adding these good ideas to the list: 8. Plain-English summary that comes before the nitty-gritty. 9. Comment on production readiness from maintainers. Maybe testimonials are helpful if they can be obtained? 10. Versioning of all of the above
Any more thoughts? I'll summarize in a JIRA in a bit. Kenn On Tue, Aug 22, 2017 at 10:45 AM, Griselda Cuevas <g...@google.com.invalid> wrote: > Hi, I'd also like to ask if versioning as proposed in BEAM-166 < > https://issues.apache.org/jira/browse/BEAM-166> is still relevant? If it > is, would this be something we want to add to this proposal? > > G > > On 21 August 2017 at 08:31, Tyler Akidau <taki...@google.com.invalid> > wrote: > > > Is there any way we could add quantitative runner metrics to this as > well? > > Like by having some benchmarks that process X amount of data, and then > > detailing in the matrix latency, throughput, and (where possible) cost, > > etc, numbers for each of the given runners? Semantic support is one > thing, > > but there are other differences between runners that aren't captured by > > just checking feature boxes. I'd be curious if anyone has other ideas in > > this vein as well. The benchmark idea might not be the best way to go > about > > it. > > > > -Tyler > > > > On Sun, Aug 20, 2017 at 9:43 AM Jesse Anderson < > je...@bigdatainstitute.io> > > wrote: > > > > > It'd be awesome to see these updated. I'd add two more: > > > > > > 1. A plain English summary of the runner's support in Beam. People > who > > > are new to Beam won't understand the in-depth coverage and need a > > > general > > > idea of how it is supported. > > > 2. The production readiness of the runner. Does the maintainer think > > > this runner is production ready? > > > > > > > > > > > > On Sun, Aug 20, 2017 at 8:03 AM Kenneth Knowles <k...@google.com.invalid > > > > > wrote: > > > > > > > Hi all, > > > > > > > > I want to revamp > > > > https://beam.apache.org/documentation/runners/capability-matrix/ > > > > > > > > When Beam first started, we didn't work on feature branches for the > > core > > > > runners, and they had a lot more gaps compared to what goes on > `master` > > > > today, so this tracked our progress in a way that was easy for users > to > > > > read. Now it is still our best/only comparison page for users, but I > > > think > > > > we could improve its usefulness. > > > > > > > > For the benefit of the thread, let me inline all the capabilities > fully > > > > here: > > > > > > > > ======================== > > > > > > > > "What is being computed?" > > > > - ParDo > > > > - GroupByKey > > > > - Flatten > > > > - Combine > > > > - Composite Transforms > > > > - Side Inputs > > > > - Source API > > > > - Splittable DoFn > > > > - Metrics > > > > - Stateful Processing > > > > > > > > "Where in event time?" > > > > - Global windows > > > > - Fixed windows > > > > - Sliding windows > > > > - Session windows > > > > - Custom windows > > > > - Custom merging windows > > > > - Timestamp control > > > > > > > > "When in processing time?" > > > > - Configurable triggering > > > > - Event-time triggers > > > > - Processing-time triggers > > > > - Count triggers > > > > - [Meta]data driven triggers > > > > - Composite triggers > > > > - Allowed lateness > > > > - Timers > > > > > > > > "How do refinements relate?" > > > > - Discarding > > > > - Accumulating > > > > - Accumulating & Retracting > > > > > > > > ======================== > > > > > > > > Here are some issues I'd like to improve: > > > > > > > > - Rows that are impossible to not support (ParDo) > > > > - Rows where "support" doesn't really make sense (Composite > > transforms) > > > > - Rows are actually the same model feature (non-merging windowfns) > > > > - Rows that represent optimizations (Combine) > > > > - Rows in the wrong place (Timers) > > > > - Rows have not been designed ([Meta]Data driven triggers) > > > > - Rows with names that appear no where else (Timestamp control) > > > > - No place to compare non-model differences between runners > > > > > > > > I'm still pondering how to improve this, but I thought I'd send the > > > notion > > > > out for discussion. Some imperfect ideas I've had: > > > > > > > > 1. Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into > one > > > row > > > > 2. Make sections as users see them, like "ParDo" / "side Inputs" not > > > > "What?" / "side inputs" > > > > 3. Add rows for non-model things, like portability framework support, > > > > metrics backends, etc > > > > 4. Drop rows that are not informative, like Composite transforms, or > > not > > > > designed > > > > 5. Reorganize the windowing section to be just support for merging / > > > > non-merging windowing. > > > > 6. Switch to a more distinct color scheme than the solid vs faded > > colors > > > > currently used. > > > > 7. Find a web design to get short descriptions into the foreground to > > > make > > > > it easier to grok. > > > > > > > > These are just a few thoughts, and not necessarily compatible with > each > > > > other. What do you think? > > > > > > > > Kenn > > > > > > > -- > > > Thanks, > > > > > > Jesse > > > > > >