Closing the loop on this thread, I've summarized the suggestions into a mega-ticket at https://issues.apache.org/jira/browse/BEAM-2888
Eventually, we'll need a redesign, but there is a lot that we can do incrementally. If you want to help, make a subtask for the piece you are handling, or I can make one if there's a permissions issue. Kenn On Thu, Aug 31, 2017 at 2:02 AM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Agree, it sounds like a good idea to me. > > Regards > JB > > > On 08/31/2017 10:35 AM, Etienne Chauchot wrote: > >> Hi, >> >> I think Nexmark (https://github.com/apache/bea >> m/tree/master/sdks/java/nexmark) could help in getting quantitative >> benchmark metrics for all the runners like Tyler suggested. >> >> Another thing, the current matrix might be wrong on custom window >> merging: I think it should be*X *for Spark and Gearpump because of the >> tickets below (even though I haven't tested it lately, maybe the status has >> changed) >> >> https://issues.apache.org/jira/browse/BEAM-2759** >> >> https://issues.apache.org/jira/browse/BEAM-2499 >> >> But, as Kenn suggested grouping all windowing stuff in merging and >> non-merging windows sections, maybe this detail does not make sense anymore. >> >> Best >> >> Etienne >> >> >> >> Le 23/08/2017 à 04:28, Kenneth Knowles a écrit : >> >>> Oh, I missed >>> >>> 11. Quantitative properties. This seems like an interesting and important >>> project all on its own. Since Beam is so generic, we need pretty diverse >>> measurements for a user to have a hope of extrapolating to their use >>> case. >>> >>> Kenn >>> >>> On Tue, Aug 22, 2017 at 7:22 PM, Kenneth Knowles <k...@google.com> wrote: >>> >>> OK, so adding these good ideas to the list: >>>> >>>> 8. Plain-English summary that comes before the nitty-gritty. >>>> 9. Comment on production readiness from maintainers. Maybe testimonials >>>> are helpful if they can be obtained? >>>> 10. Versioning of all of the above >>>> >>>> Any more thoughts? I'll summarize in a JIRA in a bit. >>>> >>>> Kenn >>>> >>>> On Tue, Aug 22, 2017 at 10:45 AM, Griselda Cuevas >>>> <g...@google.com.invalid >>>> >>>>> wrote: >>>>> Hi, I'd also like to ask if versioning as proposed in BEAM-166 < >>>>> https://issues.apache.org/jira/browse/BEAM-166> is still relevant? If >>>>> it >>>>> is, would this be something we want to add to this proposal? >>>>> >>>>> G >>>>> >>>>> On 21 August 2017 at 08:31, Tyler Akidau <taki...@google.com.invalid> >>>>> wrote: >>>>> >>>>> Is there any way we could add quantitative runner metrics to this as >>>>>> >>>>> well? >>>>> >>>>>> Like by having some benchmarks that process X amount of data, and then >>>>>> detailing in the matrix latency, throughput, and (where possible) >>>>>> cost, >>>>>> etc, numbers for each of the given runners? Semantic support is one >>>>>> >>>>> thing, >>>>> >>>>>> but there are other differences between runners that aren't captured >>>>>> by >>>>>> just checking feature boxes. I'd be curious if anyone has other ideas >>>>>> in >>>>>> this vein as well. The benchmark idea might not be the best way to go >>>>>> >>>>> about >>>>> >>>>>> it. >>>>>> >>>>>> -Tyler >>>>>> >>>>>> On Sun, Aug 20, 2017 at 9:43 AM Jesse Anderson < >>>>>> >>>>> je...@bigdatainstitute.io> >>>>> >>>>>> wrote: >>>>>> >>>>>> It'd be awesome to see these updated. I'd add two more: >>>>>>> >>>>>>> 1. A plain English summary of the runner's support in Beam. >>>>>>> People >>>>>>> >>>>>> who >>>>> >>>>>> are new to Beam won't understand the in-depth coverage and need a >>>>>>> general >>>>>>> idea of how it is supported. >>>>>>> 2. The production readiness of the runner. Does the maintainer >>>>>>> >>>>>> think >>>>> >>>>>> this runner is production ready? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Aug 20, 2017 at 8:03 AM Kenneth Knowles >>>>>>> >>>>>> <k...@google.com.invalid> >>>>> >>>>>> wrote: >>>>>>> >>>>>>> Hi all, >>>>>>>> >>>>>>>> I want to revamp >>>>>>>> https://beam.apache.org/documentation/runners/capability-matrix/ >>>>>>>> >>>>>>>> When Beam first started, we didn't work on feature branches for the >>>>>>>> >>>>>>> core >>>>>> >>>>>>> runners, and they had a lot more gaps compared to what goes on >>>>>>>> >>>>>>> `master` >>>>> >>>>>> today, so this tracked our progress in a way that was easy for >>>>>>>> >>>>>>> users to >>>>> >>>>>> read. Now it is still our best/only comparison page for users, but I >>>>>>>> >>>>>>> think >>>>>>> >>>>>>>> we could improve its usefulness. >>>>>>>> >>>>>>>> For the benefit of the thread, let me inline all the capabilities >>>>>>>> >>>>>>> fully >>>>> >>>>>> here: >>>>>>>> >>>>>>>> ======================== >>>>>>>> >>>>>>>> "What is being computed?" >>>>>>>> - ParDo >>>>>>>> - GroupByKey >>>>>>>> - Flatten >>>>>>>> - Combine >>>>>>>> - Composite Transforms >>>>>>>> - Side Inputs >>>>>>>> - Source API >>>>>>>> - Splittable DoFn >>>>>>>> - Metrics >>>>>>>> - Stateful Processing >>>>>>>> >>>>>>>> "Where in event time?" >>>>>>>> - Global windows >>>>>>>> - Fixed windows >>>>>>>> - Sliding windows >>>>>>>> - Session windows >>>>>>>> - Custom windows >>>>>>>> - Custom merging windows >>>>>>>> - Timestamp control >>>>>>>> >>>>>>>> "When in processing time?" >>>>>>>> - Configurable triggering >>>>>>>> - Event-time triggers >>>>>>>> - Processing-time triggers >>>>>>>> - Count triggers >>>>>>>> - [Meta]data driven triggers >>>>>>>> - Composite triggers >>>>>>>> - Allowed lateness >>>>>>>> - Timers >>>>>>>> >>>>>>>> "How do refinements relate?" >>>>>>>> - Discarding >>>>>>>> - Accumulating >>>>>>>> - Accumulating & Retracting >>>>>>>> >>>>>>>> ======================== >>>>>>>> >>>>>>>> Here are some issues I'd like to improve: >>>>>>>> >>>>>>>> - Rows that are impossible to not support (ParDo) >>>>>>>> - Rows where "support" doesn't really make sense (Composite >>>>>>>> >>>>>>> transforms) >>>>>> >>>>>>> - Rows are actually the same model feature (non-merging windowfns) >>>>>>>> - Rows that represent optimizations (Combine) >>>>>>>> - Rows in the wrong place (Timers) >>>>>>>> - Rows have not been designed ([Meta]Data driven triggers) >>>>>>>> - Rows with names that appear no where else (Timestamp control) >>>>>>>> - No place to compare non-model differences between runners >>>>>>>> >>>>>>>> I'm still pondering how to improve this, but I thought I'd send the >>>>>>>> >>>>>>> notion >>>>>>> >>>>>>>> out for discussion. Some imperfect ideas I've had: >>>>>>>> >>>>>>>> 1. Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into >>>>>>>> >>>>>>> one >>>>> >>>>>> row >>>>>>> >>>>>>>> 2. Make sections as users see them, like "ParDo" / "side Inputs" not >>>>>>>> "What?" / "side inputs" >>>>>>>> 3. Add rows for non-model things, like portability framework >>>>>>>> >>>>>>> support, >>>>> >>>>>> metrics backends, etc >>>>>>>> 4. Drop rows that are not informative, like Composite transforms, or >>>>>>>> >>>>>>> not >>>>>> >>>>>>> designed >>>>>>>> 5. Reorganize the windowing section to be just support for merging / >>>>>>>> non-merging windowing. >>>>>>>> 6. Switch to a more distinct color scheme than the solid vs faded >>>>>>>> >>>>>>> colors >>>>>> >>>>>>> currently used. >>>>>>>> 7. Find a web design to get short descriptions into the foreground >>>>>>>> >>>>>>> to >>>>> >>>>>> make >>>>>>> >>>>>>>> it easier to grok. >>>>>>>> >>>>>>>> These are just a few thoughts, and not necessarily compatible with >>>>>>>> >>>>>>> each >>>>> >>>>>> other. What do you think? >>>>>>>> >>>>>>>> Kenn >>>>>>>> >>>>>>>> -- >>>>>>> Thanks, >>>>>>> >>>>>>> Jesse >>>>>>> >>>>>>> >>>> >> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >