[ 
https://issues.apache.org/jira/browse/BEAM-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-2888:
--------------------------------
    Description: 
The goal for this project has changed: We now want to create a completely new 
Capability Matrix that is based on the ValidatesRunner tests that we run on the 
various Apache Beam runners.

We can use the test in ./test-infra/validates-runner/ to generate a JSON file 
that contains the capabilities supported by various runners and tested by each 
individual test.

----------------------------------------------------

 

Discussion: 
[https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E]

Summarizing that discussion, we have a lot of issues/wishes. Some can be 
addressed as one-off and some need a unified reorganization of the runner 
comparison.

Basic corrections:
 - Remove rows that impossible to not support (ParDo)
 - Remove rows where "support" doesn't really make sense (Composite transforms)
 - Deduplicate rows are actually the same model feature (all non-merging 
windowing / all merging windowing)
 - Clearly separate rows that represent optimizations (Combine)
 - Correct rows in the wrong place (Timers are actually a "what...?" row)
 - Separate or remove rows have not been designed ([Meta]Data driven triggers, 
retractions)
 - Rename rows with names that appear no where else (Timestamp control, which 
is called a TimestampCombiner in Java)
 - Switch to a more distinct color scheme for full/partial support (currently 
just solid/faded colors)
 - Switch to something clearer than "~" for partial support, versus ✘ and ✓ for 
none and full.
 - Correct Gearpump support for merging windows (see BEAM-2759)
 - Correct Spark support for non-merging and merging windows (see BEAM-2499)

Minor rewrites:
 - Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
 - Make sections as users see them, like "ParDo" / "side Inputs" not "What?" / 
"side inputs"
 - Add rows for non-model things, like portability framework support, metrics 
backends, etc

Bigger rewrites:
 - Add versioning to the comparison, as in BEAM-166
 - Find a way to fit in a plain English summary of runner's support in Beam. It 
should come first, as it is what new users need before getting to details.
 - Find a way to describe production readiness of runners and/or testimonials 
of who is using it in production.
 - Have a place to compare non-model differences between runners

Changes requiring engineering efforts:
 - Gather and add quantitative runner metrics, perhaps Nexmark results for 
mid-level, smaller benchmarks for measuring aspects of specific features, and 
larger end-to-end benchmarks to get an idea how it might actually perform on a 
use case
 - Tighter coupling of the matrix portion of the comparison with tags on 
ValidatesRunner tests

If you care to address some aspect of this, please reach out and/or just file a 
subtask and address it.

  was:
Discussion: 
https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E

Summarizing that discussion, we have a lot of issues/wishes. Some can be 
addressed as one-off and some need a unified reorganization of the runner 
comparison.

Basic corrections:

 - Remove rows that impossible to not support (ParDo)
 - Remove rows where "support" doesn't really make sense (Composite transforms)
 - Deduplicate rows are actually the same model feature (all non-merging 
windowing / all merging windowing)
 - Clearly separate rows that represent optimizations (Combine)
 - Correct rows in the wrong place (Timers are actually a "what...?" row)
 - Separate or remove rows have not been designed ([Meta]Data driven triggers, 
retractions)
 - Rename rows with names that appear no where else (Timestamp control, which 
is called a TimestampCombiner in Java)
 - Switch to a more distinct color scheme for full/partial support (currently 
just solid/faded colors)
 - Switch to something clearer than "~" for partial support, versus ✘ and ✓ for 
none and full.
 - Correct Gearpump support for merging windows (see BEAM-2759)
 - Correct Spark support for non-merging and merging windows (see BEAM-2499)

Minor rewrites:

 - Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
 - Make sections as users see them, like "ParDo" / "side Inputs" not "What?" / 
"side inputs"
 - Add rows for non-model things, like portability framework support, metrics 
backends, etc

Bigger rewrites:

 - Add versioning to the comparison, as in BEAM-166
 - Find a way to fit in a plain English summary of runner's support in Beam. It 
should come first, as it is what new users need before getting to details.
 - Find a way to describe production readiness of runners and/or testimonials 
of who is using it in production.
 - Have a place to compare non-model differences between runners

Changes requiring engineering efforts:

 - Gather and add quantitative runner metrics, perhaps Nexmark results for 
mid-level, smaller benchmarks for measuring aspects of specific features, and 
larger end-to-end benchmarks to get an idea how it might actually perform on a 
use case
 - Tighter coupling of the matrix portion of the comparison with tags on 
ValidatesRunner tests

If you care to address some aspect of this, please reach out and/or just file a 
subtask and address it.


> Runner Comparison / Capability Matrix revamp
> --------------------------------------------
>
>                 Key: BEAM-2888
>                 URL: https://issues.apache.org/jira/browse/BEAM-2888
>             Project: Beam
>          Issue Type: Improvement
>          Components: website
>            Reporter: Kenneth Knowles
>            Priority: P3
>              Labels: full-time, gsoc2022, gsod, gsod2019, gsod2022, mentor
>          Time Spent: 19h
>  Remaining Estimate: 0h
>
> The goal for this project has changed: We now want to create a completely new 
> Capability Matrix that is based on the ValidatesRunner tests that we run on 
> the various Apache Beam runners.
> We can use the test in ./test-infra/validates-runner/ to generate a JSON file 
> that contains the capabilities supported by various runners and tested by 
> each individual test.
> ----------------------------------------------------
>  
> Discussion: 
> [https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E]
> Summarizing that discussion, we have a lot of issues/wishes. Some can be 
> addressed as one-off and some need a unified reorganization of the runner 
> comparison.
> Basic corrections:
>  - Remove rows that impossible to not support (ParDo)
>  - Remove rows where "support" doesn't really make sense (Composite 
> transforms)
>  - Deduplicate rows are actually the same model feature (all non-merging 
> windowing / all merging windowing)
>  - Clearly separate rows that represent optimizations (Combine)
>  - Correct rows in the wrong place (Timers are actually a "what...?" row)
>  - Separate or remove rows have not been designed ([Meta]Data driven 
> triggers, retractions)
>  - Rename rows with names that appear no where else (Timestamp control, which 
> is called a TimestampCombiner in Java)
>  - Switch to a more distinct color scheme for full/partial support (currently 
> just solid/faded colors)
>  - Switch to something clearer than "~" for partial support, versus ✘ and ✓ 
> for none and full.
>  - Correct Gearpump support for merging windows (see BEAM-2759)
>  - Correct Spark support for non-merging and merging windows (see BEAM-2499)
> Minor rewrites:
>  - Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
>  - Make sections as users see them, like "ParDo" / "side Inputs" not "What?" 
> / "side inputs"
>  - Add rows for non-model things, like portability framework support, metrics 
> backends, etc
> Bigger rewrites:
>  - Add versioning to the comparison, as in BEAM-166
>  - Find a way to fit in a plain English summary of runner's support in Beam. 
> It should come first, as it is what new users need before getting to details.
>  - Find a way to describe production readiness of runners and/or testimonials 
> of who is using it in production.
>  - Have a place to compare non-model differences between runners
> Changes requiring engineering efforts:
>  - Gather and add quantitative runner metrics, perhaps Nexmark results for 
> mid-level, smaller benchmarks for measuring aspects of specific features, and 
> larger end-to-end benchmarks to get an idea how it might actually perform on 
> a use case
>  - Tighter coupling of the matrix portion of the comparison with tags on 
> ValidatesRunner tests
> If you care to address some aspect of this, please reach out and/or just file 
> a subtask and address it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to