Thanks Luke! I've had a pass.

-Max

On 28.08.20 01:22, Luke Cwik wrote:
As an update.

Direct and Twister2 are done.
Samza: is ready for review[1].
Flink: is almost ready for review. [2] lays all the groundwork for the migration and [3] finishes the migration (there is a timeout happening in FlinkSubmissionTest that I'm trying to figure out).
No further updates on Spark[4] or Jet[5].

@Maximilian Michels <mailto:m...@apache.org> or @t...@apache.org <mailto:thomas.we...@gmail.com>, can either of you take a look at the Flink PRs? @ke.wu...@icloud.com <mailto:ke.wu...@icloud.com>, Since Xinyu delegated to you, can you take another look at the Samza PR?

1: https://github.com/apache/beam/pull/12617
2: https://github.com/apache/beam/pull/12706
3: https://github.com/apache/beam/pull/12708
4: https://github.com/apache/beam/pull/12603
5: https://github.com/apache/beam/pull/12616

On Tue, Aug 18, 2020 at 11:42 AM Pulasthi Supun Wickramasinghe <pulasthi...@gmail.com <mailto:pulasthi...@gmail.com>> wrote:

    Hi Luke

    Will take a look at this as soon as possible and get back to you.

    Best Regards,
    Pulasthi

    On Tue, Aug 18, 2020 at 2:30 PM Luke Cwik <lc...@google.com
    <mailto:lc...@google.com>> wrote:

        I have made some good progress here and have gotten to the
        following state for non-portable runners:

        DirectRunner[1]: Merged. Supports Read.Bounded and Read.Unbounded.
        Twister2[2]: Ready for review. Supports Read.Bounded, the
        current runner doesn't support unbounded pipelines.
        Spark[3]: WIP. Supports Read.Bounded, Nexmark suite passes. Not
        certain about level of unbounded pipeline support coverage since
        Spark uses its own tiny suite of tests to get unbounded pipeline
        coverage instead of the validates runner set.
        Jet[4]: WIP. Supports Read.Bounded. Read.Unbounded definitely
        needs additional work.
        Sazma[5]: WIP. Supports Read.Bounded. Not certain about level of
        unbounded pipeline support coverage since Spark uses its own
        tiny suite of tests to get unbounded pipeline coverage instead
        of the validates runner set.
        Flink: Unstarted.

        @Pulasthi Supun Wickramasinghe <mailto:pulasthi...@gmail.com> ,
        can you help me with the Twister2 PR[2]?
        @Ismaël Mejía <mailto:ieme...@gmail.com>, is PR[3] the expected
        level of support for unbounded pipelines and hence ready for review?
        @Jozsef Bartok <mailto:jo...@hazelcast.com>, can you help me out
        to get support for unbounded splittable DoFn's into Jet[4]?
        @Xinyu Liu <mailto:xinyuliu...@gmail.com>, is PR[5] the expected
        level of support for unbounded pipelines and hence ready for review?

        1: https://github.com/apache/beam/pull/12519
        2: https://github.com/apache/beam/pull/12594
        3: https://github.com/apache/beam/pull/12603
        4: https://github.com/apache/beam/pull/12616
        5: https://github.com/apache/beam/pull/12617

        On Tue, Aug 11, 2020 at 10:55 AM Luke Cwik <lc...@google.com
        <mailto:lc...@google.com>> wrote:

            There shouldn't be any changes required since the wrapper
            will smoothly transition the execution to be run as an SDF.
            New IOs should strongly prefer to use SDF since it should be
            simpler to write and will be more flexible but they can use
            the "*Source"-based APIs. Eventually we'll deprecate the
            APIs but we will never stop supporting them. Eventually they
            should all be migrated to use SDF and if there is another
            major Beam version, we'll finally be able to remove them.

            On Tue, Aug 11, 2020 at 8:40 AM Alexey Romanenko
            <aromanenko....@gmail.com <mailto:aromanenko....@gmail.com>>
            wrote:

                Hi Luke,

                Great to hear about such progress on this!

                Talking about opt-out for all runners in the future,
                will it require any code changes for current
                “*Source”-based IOs or the wrappers should completely
                smooth this transition?
                Do we need to require to create new IOs only based on
                SDF or again, the wrappers should help to avoid this?

                On 10 Aug 2020, at 22:59, Luke Cwik <lc...@google.com
                <mailto:lc...@google.com>> wrote:

                In the past couple of months wrappers[1, 2] have been
                added to the Beam Java SDK which can execute
                BoundedSource and UnboundedSource as Splittable DoFns.
                These have been opt-out for portable pipelines (e.g.
                Dataflow runner v2, XLang pipelines on Flink/Spark)
                and opt-in using an experiment for all other pipelines.

                I would like to start making the non-portable
                pipelines starting with the DirectRunner[3] to be
                opt-out with the plan that eventually all runners will
                only execute splittable DoFns and the
                BoundedSource/UnboundedSource specific execution logic
                from the runners will be removed.

                Users will be able to opt-in any pipeline using the
                experiment 'use_sdf_read' and opt-out with the
                experiment 'use_deprecated_read'. (For portable
                pipelines these experiments were 'beam_fn_api' and
                'beam_fn_api_use_deprecated_read' respectively and I
                have added these two additional aliases to make the
                experience less confusing).

                1:
                
https://github.com/apache/beam/blob/af1ce643d8fde5352d4519a558de4a2dfd24721d/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L275
                2:
                
https://github.com/apache/beam/blob/af1ce643d8fde5352d4519a558de4a2dfd24721d/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L449
                3: https://github.com/apache/beam/pull/12519



-- Pulasthi S. Wickramasinghe
    PhD Candidate  | Research Assistant
    School of Informatics and Computing | Digital Science Center
    Indiana University, Bloomington
    cell: 224-386-9035 <tel:(224)%20386-9035>

Reply via email to