[
https://issues.apache.org/jira/browse/BEAM-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pablo Estrada updated BEAM-2888:
--------------------------------
Description:
The goal for this project has changed: We now want to create a completely new
Capability Matrix that is based on the ValidatesRunner tests that we run on the
various Apache Beam runners.
We can use the test in ./test-infra/validates-runner/ to generate a JSON file
that contains the capabilities supported by various runners and tested by each
individual test.
----------------------------------------------------
Discussion:
[https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E]
Summarizing that discussion, we have a lot of issues/wishes. Some can be
addressed as one-off and some need a unified reorganization of the runner
comparison.
Basic corrections:
- Remove rows that impossible to not support (ParDo)
- Remove rows where "support" doesn't really make sense (Composite transforms)
- Deduplicate rows are actually the same model feature (all non-merging
windowing / all merging windowing)
- Clearly separate rows that represent optimizations (Combine)
- Correct rows in the wrong place (Timers are actually a "what...?" row)
- Separate or remove rows have not been designed ([Meta]Data driven triggers,
retractions)
- Rename rows with names that appear no where else (Timestamp control, which
is called a TimestampCombiner in Java)
- Switch to a more distinct color scheme for full/partial support (currently
just solid/faded colors)
- Switch to something clearer than "~" for partial support, versus ✘ and ✓ for
none and full.
- Correct Gearpump support for merging windows (see BEAM-2759)
- Correct Spark support for non-merging and merging windows (see BEAM-2499)
Minor rewrites:
- Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
- Make sections as users see them, like "ParDo" / "side Inputs" not "What?" /
"side inputs"
- Add rows for non-model things, like portability framework support, metrics
backends, etc
Bigger rewrites:
- Add versioning to the comparison, as in BEAM-166
- Find a way to fit in a plain English summary of runner's support in Beam. It
should come first, as it is what new users need before getting to details.
- Find a way to describe production readiness of runners and/or testimonials
of who is using it in production.
- Have a place to compare non-model differences between runners
Changes requiring engineering efforts:
- Gather and add quantitative runner metrics, perhaps Nexmark results for
mid-level, smaller benchmarks for measuring aspects of specific features, and
larger end-to-end benchmarks to get an idea how it might actually perform on a
use case
- Tighter coupling of the matrix portion of the comparison with tags on
ValidatesRunner tests
If you care to address some aspect of this, please reach out and/or just file a
subtask and address it.
was:
Discussion:
https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E
Summarizing that discussion, we have a lot of issues/wishes. Some can be
addressed as one-off and some need a unified reorganization of the runner
comparison.
Basic corrections:
- Remove rows that impossible to not support (ParDo)
- Remove rows where "support" doesn't really make sense (Composite transforms)
- Deduplicate rows are actually the same model feature (all non-merging
windowing / all merging windowing)
- Clearly separate rows that represent optimizations (Combine)
- Correct rows in the wrong place (Timers are actually a "what...?" row)
- Separate or remove rows have not been designed ([Meta]Data driven triggers,
retractions)
- Rename rows with names that appear no where else (Timestamp control, which
is called a TimestampCombiner in Java)
- Switch to a more distinct color scheme for full/partial support (currently
just solid/faded colors)
- Switch to something clearer than "~" for partial support, versus ✘ and ✓ for
none and full.
- Correct Gearpump support for merging windows (see BEAM-2759)
- Correct Spark support for non-merging and merging windows (see BEAM-2499)
Minor rewrites:
- Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
- Make sections as users see them, like "ParDo" / "side Inputs" not "What?" /
"side inputs"
- Add rows for non-model things, like portability framework support, metrics
backends, etc
Bigger rewrites:
- Add versioning to the comparison, as in BEAM-166
- Find a way to fit in a plain English summary of runner's support in Beam. It
should come first, as it is what new users need before getting to details.
- Find a way to describe production readiness of runners and/or testimonials
of who is using it in production.
- Have a place to compare non-model differences between runners
Changes requiring engineering efforts:
- Gather and add quantitative runner metrics, perhaps Nexmark results for
mid-level, smaller benchmarks for measuring aspects of specific features, and
larger end-to-end benchmarks to get an idea how it might actually perform on a
use case
- Tighter coupling of the matrix portion of the comparison with tags on
ValidatesRunner tests
If you care to address some aspect of this, please reach out and/or just file a
subtask and address it.
> Runner Comparison / Capability Matrix revamp
> --------------------------------------------
>
> Key: BEAM-2888
> URL: https://issues.apache.org/jira/browse/BEAM-2888
> Project: Beam
> Issue Type: Improvement
> Components: website
> Reporter: Kenneth Knowles
> Priority: P3
> Labels: full-time, gsoc2022, gsod, gsod2019, gsod2022, mentor
> Time Spent: 19h
> Remaining Estimate: 0h
>
> The goal for this project has changed: We now want to create a completely new
> Capability Matrix that is based on the ValidatesRunner tests that we run on
> the various Apache Beam runners.
> We can use the test in ./test-infra/validates-runner/ to generate a JSON file
> that contains the capabilities supported by various runners and tested by
> each individual test.
> ----------------------------------------------------
>
> Discussion:
> [https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E]
> Summarizing that discussion, we have a lot of issues/wishes. Some can be
> addressed as one-off and some need a unified reorganization of the runner
> comparison.
> Basic corrections:
> - Remove rows that impossible to not support (ParDo)
> - Remove rows where "support" doesn't really make sense (Composite
> transforms)
> - Deduplicate rows are actually the same model feature (all non-merging
> windowing / all merging windowing)
> - Clearly separate rows that represent optimizations (Combine)
> - Correct rows in the wrong place (Timers are actually a "what...?" row)
> - Separate or remove rows have not been designed ([Meta]Data driven
> triggers, retractions)
> - Rename rows with names that appear no where else (Timestamp control, which
> is called a TimestampCombiner in Java)
> - Switch to a more distinct color scheme for full/partial support (currently
> just solid/faded colors)
> - Switch to something clearer than "~" for partial support, versus ✘ and ✓
> for none and full.
> - Correct Gearpump support for merging windows (see BEAM-2759)
> - Correct Spark support for non-merging and merging windows (see BEAM-2499)
> Minor rewrites:
> - Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
> - Make sections as users see them, like "ParDo" / "side Inputs" not "What?"
> / "side inputs"
> - Add rows for non-model things, like portability framework support, metrics
> backends, etc
> Bigger rewrites:
> - Add versioning to the comparison, as in BEAM-166
> - Find a way to fit in a plain English summary of runner's support in Beam.
> It should come first, as it is what new users need before getting to details.
> - Find a way to describe production readiness of runners and/or testimonials
> of who is using it in production.
> - Have a place to compare non-model differences between runners
> Changes requiring engineering efforts:
> - Gather and add quantitative runner metrics, perhaps Nexmark results for
> mid-level, smaller benchmarks for measuring aspects of specific features, and
> larger end-to-end benchmarks to get an idea how it might actually perform on
> a use case
> - Tighter coupling of the matrix portion of the comparison with tags on
> ValidatesRunner tests
> If you care to address some aspect of this, please reach out and/or just file
> a subtask and address it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)