Re: [Input needed] Capability Matrix Visual Redesign for extended version

Kenneth Knowles Wed, 06 Jan 2021 09:48:44 -0800

Very good questions. Answers inline.

On Wed, Jan 6, 2021 at 8:16 AM Agnieszka Sell <[email protected]>
wrote:


> Hi Kenneth,
>
> Thank you for your feedback about the Capability Matrix! I have several
> questions about it:
>
> *Feedback: I think we can also remove rows that are not started or not 
> complete in the Beam Model, and remove the Beam Model column.*
> Question:  If we remove the Beam model column the whole point of making it 
> static and showing the capabilities would be lost. Isn't the point to show 
> capabilities of Beam vs. other tools?
>
>
To clarify the purpose of the capability matrix: it is not comparing Beam
vs other tools. It is comparing adapters that run a Beam pipeline on top of
other tools. For example the "Apache Spark" column describes the
capabilities of Beam's "SparkRunner", not Spark itself. Maybe we need to
adjust the wording above the matrix to make this clear.

So the column with the title "What is being computed?" is already a full
list of the features of the Beam Model. The rows where "Beam Model" has an
"X" or "~" are just ideas for future work, or features still in progress.

*Feedback: I think Splittable DoFn really just deserves one row for
bounded, one for unbounded, and any caveats go in the details.*
> Question: How would it look like? All this in one matrix or separate?
>
>
I suggest to add it as a row in "What is being computed?" like ParDo,
GroupByKey, ..., Stateful Processing, Splittable DoFn.


>
> *Feedback: All the windowing rows can be condensed into "Basic windowing 
> support" and "Merging windowing support" and any runner that can only run a 
> couple WindowFns can have details in the caveats. At this point any runner 
> that doesn't do Windowing by invoking a user's WindowFn simply doesn't really 
> support windowing in the model.*
> Suggestion: Do we still have a separate matrix for only two(?) rows?
>
>
My opinion may be controversial... I don't care that much about splitting
What/Where/When/How. Especially it is confusing to use "Where" to talk
about event time.

Personally, I would just make all the last three tables into a single table
"Windowing and Triggering" and the rows "Basic windowing support", "Merging
windowing support", "Configurable triggering", "Allowed lateness",
"Discarding mode", "Accumulating mode". I would remove Timers from that
table and rename "Stateful processing" in the table above to "State &
timers" since these are really one feature taken together.

Many of those decisions are not really part of the redesign, but just ideas
to save space. If you need more space savings, I can find more... for
example there is no value to ParDo, GroupByKey, and Flatten being separate,
really. If you don't have those all implemented, you don't have a  Beam
runner at all, so they will never be different. This could be omitted. Or
it could be a single "Baseline runner" row to add caveats. For example the
existing caveats are unnecessary: Spark has a caveat on GroupByKey that is
really about triggers. Structured streaming has "~" but the details are not
actually caveats.

Kenn


> Kind regards,
>
> Agnieszka
>
> On Mon, Dec 21, 2020 at 7:49 PM Griselda Cuevas <[email protected]> wrote:
>
>> Thanks Kenn, this is super helpful.
>>
>>
>>
>> On Mon, 21 Dec 2020 at 09:57, Kenneth Knowles <[email protected]> wrote:
>>
>>> For the capability matrix, part of the problem is that the rows don't
>>> all make that much sense, as we've discussed a couple times.
>>>
>>> But assuming we keep the content identical, maybe we could just have the
>>> collapsed view and make the table selectable where *just* the selected cell
>>> controls content below? You won't be able to do side-by-side comparisons of
>>> the full text of things, but you will be able to keep the overview and
>>> drill in one at a time quickly. Just one idea.
>>>
>>> A couple ways to save space without rearchitecting it:
>>>
>>>  - Apache Hadoop MapReduce and JStorm can be removed as they are on
>>> branches, not released.
>>>  - I think we can also remove rows that are not started or not complete
>>> in the Beam Model, and remove the Beam Model column.
>>>  - I think Splittable DoFn really just deserves one row for bounded, one
>>> for unbounded, and any caveats go in the details.
>>>  - All the windowing rows can be condensed into "Basic windowing
>>> support" and "Merging windowing support" and any runner that can only run a
>>> couple WindowFns can have details in the caveats. At this point any runner
>>> that doesn't do Windowing by invoking a user's WindowFn simply doesn't
>>> really support windowing in the model.
>>>  - "Configurable triggering" can absorb "Event-time triggers",
>>> "Processing-time triggers", "Count triggers", and "Composite triggers".
>>> Same. At this point any runner that doesn't support the whole triggering
>>> language doesn't really support triggers fully.
>>>
>>> Kenn
>>>
>>> On Mon, Dec 14, 2020 at 7:39 PM Griselda Cuevas <[email protected]> wrote:
>>>
>>>> Hi folks, another page that's getting a refresh this time around is the
>>>> Capability Matrix, which is one of the most critical pages for users as
>>>> they evaluate the current support for each of the Beam runners.
>>>>
>>>> The situation we'd like to get your input on is: How do we optimize the
>>>> expanded version of the capability matrix, which explains the level of
>>>> support in each of the functions?
>>>>
>>>> Right now the text gets in the way of analyzing the table and makes
>>>> reading hard. You can see a screenshot in the Beam wiki here [1], the file
>>>> is titled current_CapMatExt.
>>>>
>>>> One of the proposed solutions is that after clicking the link "(click
>>>> to expand details)", we load a new page that has the corresponding table to
>>>> the click (what, where, when, how) at the top, and all the content of each
>>>> runner/function gets displayed at the bottom of the page, the file with the
>>>> proposed design is also in the Beam wiki here [1] and the file's name is
>>>> proposed_CapMatExt. This solution isn't perfect either, since we'd need to
>>>> move too much text under the table and reading isn't much easier.
>>>>
>>>> Do you have suggestions/ideas in how to condense the extended version?
>>>>
>>>> Share with us your feedback through this week,
>>>> Thanks!
>>>> G
>>>>
>>>>
>>>> [1]
>>>> https://cwiki.apache.org/confluence/display/BEAM/Website+Redesign+Files
>>>>
>>>
>
> --
>
> Agnieszka Sell
> Polidea <https://www.polidea.com/> | Project Manager
>
> M: *+48 504 901 334* <+48504901334>
> E: [email protected]
> [image: Polidea] <https://www.polidea.com/>
>
> Check out our projects! <https://www.polidea.com/our-work>
> [image: Github] <https://github.com/Polidea> [image: Facebook]
> <https://www.facebook.com/Polidea.Software> [image: Twitter]
> <https://twitter.com/polidea> [image: Linkedin]
> <https://www.linkedin.com/company/polidea> [image: Instagram]
> <https://instagram.com/polidea>
>
> Unique Tech
> Check out our projects! <https://www.polidea.com/our-work>
>

Re: [Input needed] Capability Matrix Visual Redesign for extended version

Reply via email to