[jira] [Work logged] (BEAM-2888) Runner Comparison / Capability Matrix revamp

ASF GitHub Bot (JIRA) Tue, 14 May 2019 10:19:53 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-2888?focusedWorklogId=241883&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-241883
 ]


ASF GitHub Bot logged work on BEAM-2888:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/May/19 17:18
            Start Date: 14/May/19 17:18
    Worklog Time Spent: 10m 
      Work Description: xinyuiscool commented on pull request #8576: 
[BEAM-2888] Add the not-yet-fully-designed drain and checkpoint to runner 
comparison
URL: https://github.com/apache/beam/pull/8576#discussion_r283911070
 
 

 ##########
 File path: website/src/_data/capability-matrix.yml
 ##########
 @@ -1367,3 +1367,102 @@ categories:
             l1: 'No'
             l2: pending model support
             l3: ''
+  - description: Additional features
+    anchor: misc
+    color-b: 'aaa'
+    color-y: 'bbb'
+    color-p: 'ccc'
+    color-n: 'ddd'
+    rows:
+      - name: Drain
+        values:
+          - class: model
+            l1: 'Partially'
+            l2: 
+            l3: APIs and semantics for draining a pipeline are under 
discussion. This would cause incomplete aggregations to be emitted regardless 
of trigger and tagged with metadata indicating it is incomplated.
+          - class: dataflow
+            l1: 'Partially'
+            l2: 
+            l3: Dataflow has a native drain operation, but it does not work in 
the presence of event time timer loops. Final implemention pending model 
support.
+          - class: flink
+            l1: 
+            l2: 
+            l3: 
+          - class: spark
+            l1: 
+            l2: 
+            l3: 
+          - class: apex
+            l1: 
+            l2: 
+            l3: 
+          - class: gearpump
+            l1: 
+            l2: 
+            l3: 
+          - class: mapreduce
+            l1: 
+            l2: 
+            l3: 
+          - class: jstorm
+            l1:
+            l2: 
+            l3: 
+          - class: ibmstreams
+            l1: 
+            l2: 
+            l3: 
+          - class: samza
+            l1: 
+            l2: 
+            l3: 
+          - class: nemo
+            l1: 
+            l2: 
+            l3:
+      - name: Checkpoint
+        values:
+          - class: model
+            l1: 'Partially'
+            l2: 
+            l3: APIs and semantics for saving a pipeline checkpoint are under 
discussion. This would be a runner-specific materialization of the pipeline 
state required to resume or duplicate the pipeline. 
+          - class: dataflow
+            l1: 'No'
+            l2: 
+            l3: 
+          - class: flink
+            l1: 'Partially'
+            l2: 
+            l3: Flink has a native savepoint capability
+          - class: spark
+            l1: 
+            l2: 
+            l3: 
+          - class: apex
+            l1: 
+            l2: 
+            l3: 
+          - class: gearpump
+            l1: 
+            l2: 
+            l3: 
+          - class: mapreduce
+            l1: 
+            l2: 
+            l3: 
+          - class: jstorm
+            l1: 
+            l2: 
+            l3: 
+          - class: ibmstreams
+            l1: 
+            l2: 
+            l3: 
+          - class: samza
 
 Review comment:
   For Samza, it's using its native checkpoint capability. Basically checkpoint 
happens periodically based on user config. The checkpoints are saved in a 
metadata store or stream, and the states are flushed during checkpointing. 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 241883)
    Time Spent: 1h  (was: 50m)

> Runner Comparison / Capability Matrix revamp
> --------------------------------------------
>
>                 Key: BEAM-2888
>                 URL: https://issues.apache.org/jira/browse/BEAM-2888
>             Project: Beam
>          Issue Type: Improvement
>          Components: website
>            Reporter: Kenneth Knowles
>            Assignee: Griselda Cuevas Zambrano
>            Priority: Major
>              Labels: gsod, gsod2019
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Discussion: 
> https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E
> Summarizing that discussion, we have a lot of issues/wishes. Some can be 
> addressed as one-off and some need a unified reorganization of the runner 
> comparison.
> Basic corrections:
>  - Remove rows that impossible to not support (ParDo)
>  - Remove rows where "support" doesn't really make sense (Composite 
> transforms)
>  - Deduplicate rows are actually the same model feature (all non-merging 
> windowing / all merging windowing)
>  - Clearly separate rows that represent optimizations (Combine)
>  - Correct rows in the wrong place (Timers are actually a "what...?" row)
>  - Separate or remove rows have not been designed ([Meta]Data driven 
> triggers, retractions)
>  - Rename rows with names that appear no where else (Timestamp control, which 
> is called a TimestampCombiner in Java)
>  - Switch to a more distinct color scheme for full/partial support (currently 
> just solid/faded colors)
>  - Switch to something clearer than "~" for partial support, versus ✘ and ✓ 
> for none and full.
>  - Correct Gearpump support for merging windows (see BEAM-2759)
>  - Correct Spark support for non-merging and merging windows (see BEAM-2499)
> Minor rewrites:
>  - Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
>  - Make sections as users see them, like "ParDo" / "side Inputs" not "What?" 
> / "side inputs"
>  - Add rows for non-model things, like portability framework support, metrics 
> backends, etc
> Bigger rewrites:
>  - Add versioning to the comparison, as in BEAM-166
>  - Find a way to fit in a plain English summary of runner's support in Beam. 
> It should come first, as it is what new users need before getting to details.
>  - Find a way to describe production readiness of runners and/or testimonials 
> of who is using it in production.
>  - Have a place to compare non-model differences between runners
> Changes requiring engineering efforts:
>  - Gather and add quantitative runner metrics, perhaps Nexmark results for 
> mid-level, smaller benchmarks for measuring aspects of specific features, and 
> larger end-to-end benchmarks to get an idea how it might actually perform on 
> a use case
>  - Tighter coupling of the matrix portion of the comparison with tags on 
> ValidatesRunner tests
> If you care to address some aspect of this, please reach out and/or just file 
> a subtask and address it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2888) Runner Comparison / Capability Matrix revamp

Reply via email to