+1 for having plain English feature descriptions. Nitpick: the capability matrix uses the "~" symbol, the meaning of which is not entirely clear from the context. I think a legend would be helpful given things have gone beyond ✘ and ✓.
-Stas On Mon, Aug 28, 2017 at 7:23 PM Lukasz Cwik <lc...@google.com.invalid> wrote: > I agree with you Aljoscha, a data driven approach of what features work > based upon test results being summarized and which ones scale based upon > benchmarks seems like a great way to differentiate runners strengths. > > On Mon, Aug 28, 2017 at 8:39 AM, Aljoscha Krettek <aljos...@apache.org> > wrote: > > > I like where this is going! > > > > Regarding benchmarking, I think we could do this if we had common > > benchmarking infrastructure and pipelines that regularly run on different > > Runners so that we have up-to-date data. > > > > I think we can also have a more technical section where we show stats on > > the level of support via the excluded ValidatesRunner tests. This is hard > > data that we have on every Runner and we can annotate it to explain why a > > certain Runner has a given restriction. This is a bit different from what > > Kenn initially suggested but I think we should have both. Plus, this very > > clearly specifies what feature is (somewhat) validated to work in a given > > Runner. > > > > Regarding PCollectionView support in Flink, I think this actually works > > and the ValidatesRunner tests pass for this. Not sure what is going on in > > that test case yet. For reference, this is the Issue: > > https://issues.apache.org/jira/browse/BEAM-2806 < > > https://issues.apache.org/jira/browse/BEAM-2806> > > > > Best, > > Aljoscha > > > > > On 23. Aug 2017, at 21:24, Mingmin Xu <mingm...@gmail.com> wrote: > > > > > > I would like to have an API compatibility testing. AFAIK there's still > > gap > > > to achieve our goal (one job for any runner), that means developers > > should > > > notice the limitation when writing the job. For example PCollectionView > > is > > > not well supported in FlinkRunner(not quite sure the current status as > my > > > test job is broken)/SparkRunner streaming. > > > > > >> 5. Reorganize the windowing section to be just support for merging / > > > non-merging windowing. > > > sliding/fix_window/session is more straightforward to me, > > > merging/non-merging is more about the backend implementation. > > > > > > > > > On Tue, Aug 22, 2017 at 7:28 PM, Kenneth Knowles > <k...@google.com.invalid > > > > > > wrote: > > > > > >> Oh, I missed > > >> > > >> 11. Quantitative properties. This seems like an interesting and > > important > > >> project all on its own. Since Beam is so generic, we need pretty > diverse > > >> measurements for a user to have a hope of extrapolating to their use > > case. > > >> > > >> Kenn > > >> > > >> On Tue, Aug 22, 2017 at 7:22 PM, Kenneth Knowles <k...@google.com> > > wrote: > > >> > > >>> OK, so adding these good ideas to the list: > > >>> > > >>> 8. Plain-English summary that comes before the nitty-gritty. > > >>> 9. Comment on production readiness from maintainers. Maybe > testimonials > > >>> are helpful if they can be obtained? > > >>> 10. Versioning of all of the above > > >>> > > >>> Any more thoughts? I'll summarize in a JIRA in a bit. > > >>> > > >>> Kenn > > >>> > > >>> On Tue, Aug 22, 2017 at 10:45 AM, Griselda Cuevas > > >> <g...@google.com.invalid > > >>>> wrote: > > >>> > > >>>> Hi, I'd also like to ask if versioning as proposed in BEAM-166 < > > >>>> https://issues.apache.org/jira/browse/BEAM-166> is still relevant? > If > > >> it > > >>>> is, would this be something we want to add to this proposal? > > >>>> > > >>>> G > > >>>> > > >>>> On 21 August 2017 at 08:31, Tyler Akidau <taki...@google.com.invalid > > > > >>>> wrote: > > >>>> > > >>>>> Is there any way we could add quantitative runner metrics to this > as > > >>>> well? > > >>>>> Like by having some benchmarks that process X amount of data, and > > then > > >>>>> detailing in the matrix latency, throughput, and (where possible) > > >> cost, > > >>>>> etc, numbers for each of the given runners? Semantic support is one > > >>>> thing, > > >>>>> but there are other differences between runners that aren't > captured > > >> by > > >>>>> just checking feature boxes. I'd be curious if anyone has other > ideas > > >> in > > >>>>> this vein as well. The benchmark idea might not be the best way to > go > > >>>> about > > >>>>> it. > > >>>>> > > >>>>> -Tyler > > >>>>> > > >>>>> On Sun, Aug 20, 2017 at 9:43 AM Jesse Anderson < > > >>>> je...@bigdatainstitute.io> > > >>>>> wrote: > > >>>>> > > >>>>>> It'd be awesome to see these updated. I'd add two more: > > >>>>>> > > >>>>>> 1. A plain English summary of the runner's support in Beam. > > >> People > > >>>> who > > >>>>>> are new to Beam won't understand the in-depth coverage and need > a > > >>>>>> general > > >>>>>> idea of how it is supported. > > >>>>>> 2. The production readiness of the runner. Does the maintainer > > >>>> think > > >>>>>> this runner is production ready? > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On Sun, Aug 20, 2017 at 8:03 AM Kenneth Knowles > > >>>> <k...@google.com.invalid> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Hi all, > > >>>>>>> > > >>>>>>> I want to revamp > > >>>>>>> https://beam.apache.org/documentation/runners/capability-matrix/ > > >>>>>>> > > >>>>>>> When Beam first started, we didn't work on feature branches for > > >> the > > >>>>> core > > >>>>>>> runners, and they had a lot more gaps compared to what goes on > > >>>> `master` > > >>>>>>> today, so this tracked our progress in a way that was easy for > > >>>> users to > > >>>>>>> read. Now it is still our best/only comparison page for users, > > >> but I > > >>>>>> think > > >>>>>>> we could improve its usefulness. > > >>>>>>> > > >>>>>>> For the benefit of the thread, let me inline all the capabilities > > >>>> fully > > >>>>>>> here: > > >>>>>>> > > >>>>>>> ======================== > > >>>>>>> > > >>>>>>> "What is being computed?" > > >>>>>>> - ParDo > > >>>>>>> - GroupByKey > > >>>>>>> - Flatten > > >>>>>>> - Combine > > >>>>>>> - Composite Transforms > > >>>>>>> - Side Inputs > > >>>>>>> - Source API > > >>>>>>> - Splittable DoFn > > >>>>>>> - Metrics > > >>>>>>> - Stateful Processing > > >>>>>>> > > >>>>>>> "Where in event time?" > > >>>>>>> - Global windows > > >>>>>>> - Fixed windows > > >>>>>>> - Sliding windows > > >>>>>>> - Session windows > > >>>>>>> - Custom windows > > >>>>>>> - Custom merging windows > > >>>>>>> - Timestamp control > > >>>>>>> > > >>>>>>> "When in processing time?" > > >>>>>>> - Configurable triggering > > >>>>>>> - Event-time triggers > > >>>>>>> - Processing-time triggers > > >>>>>>> - Count triggers > > >>>>>>> - [Meta]data driven triggers > > >>>>>>> - Composite triggers > > >>>>>>> - Allowed lateness > > >>>>>>> - Timers > > >>>>>>> > > >>>>>>> "How do refinements relate?" > > >>>>>>> - Discarding > > >>>>>>> - Accumulating > > >>>>>>> - Accumulating & Retracting > > >>>>>>> > > >>>>>>> ======================== > > >>>>>>> > > >>>>>>> Here are some issues I'd like to improve: > > >>>>>>> > > >>>>>>> - Rows that are impossible to not support (ParDo) > > >>>>>>> - Rows where "support" doesn't really make sense (Composite > > >>>>> transforms) > > >>>>>>> - Rows are actually the same model feature (non-merging > > >> windowfns) > > >>>>>>> - Rows that represent optimizations (Combine) > > >>>>>>> - Rows in the wrong place (Timers) > > >>>>>>> - Rows have not been designed ([Meta]Data driven triggers) > > >>>>>>> - Rows with names that appear no where else (Timestamp control) > > >>>>>>> - No place to compare non-model differences between runners > > >>>>>>> > > >>>>>>> I'm still pondering how to improve this, but I thought I'd send > > >> the > > >>>>>> notion > > >>>>>>> out for discussion. Some imperfect ideas I've had: > > >>>>>>> > > >>>>>>> 1. Lump all the basic stuff (ParDo, GroupByKey, Read, Window) > into > > >>>> one > > >>>>>> row > > >>>>>>> 2. Make sections as users see them, like "ParDo" / "side Inputs" > > >> not > > >>>>>>> "What?" / "side inputs" > > >>>>>>> 3. Add rows for non-model things, like portability framework > > >>>> support, > > >>>>>>> metrics backends, etc > > >>>>>>> 4. Drop rows that are not informative, like Composite transforms, > > >> or > > >>>>> not > > >>>>>>> designed > > >>>>>>> 5. Reorganize the windowing section to be just support for > > >> merging / > > >>>>>>> non-merging windowing. > > >>>>>>> 6. Switch to a more distinct color scheme than the solid vs faded > > >>>>> colors > > >>>>>>> currently used. > > >>>>>>> 7. Find a web design to get short descriptions into the > foreground > > >>>> to > > >>>>>> make > > >>>>>>> it easier to grok. > > >>>>>>> > > >>>>>>> These are just a few thoughts, and not necessarily compatible > with > > >>>> each > > >>>>>>> other. What do you think? > > >>>>>>> > > >>>>>>> Kenn > > >>>>>>> > > >>>>>> -- > > >>>>>> Thanks, > > >>>>>> > > >>>>>> Jesse > > >>>>>> > > >>>>> > > >>>> > > >>> > > >>> > > >> > > > > > > > > > > > > -- > > > ---- > > > Mingmin > > > > >