Re: [Discuss] Upgrade story for Beam's execution engines

Maximilian Michels Thu, 13 Sep 2018 06:50:00 -0700

Thank you for your comments. Let me try to summarize what has beendiscussed so far:

1. The Beam LTS version will ensure a stable execution engine for aslong as the LTS life span.

2. We agree that pushing updates to the execution engine for the Runnersis only desirable if it results in a better integration with the Beammodel or if it is necessary due security or performance reasons.

3. We might have to consider adding additional build targets for aRunner for whenever the execution engine gets upgraded. This might bereally easy if the engine's API remains stable. It might also bedesirable if the upgrade path is not easy and not completelyforeseeable, e.g. Etienne mentioned Spark 1.x vs Spark 2.x Runner. TheBeam feature set could vary depending on the version.

4. In the long run, we want a stable abstraction layer for each Runnerthat, ideally, is maintained by the upstream of the execution engine. Inthe short run, this is probably not realistic, as the shared librariesof Beam are not stable enough.



On 13.09.18 14:39, Robert Bradshaw wrote:

The ideal long-term solution is, as Romain mentions, pushing therunner-specific code up to be maintained by each runner with a stableAPI to use to talk to Beam. Unfortunately, I think we're still a longway from having this Stable API, or having the clout fornon-beam-developers to maintain these bindings externally (thoughhopefully we'll get there).

In the short term, we're stuck with either hurting users that want tostick with Flink 1.5, hurting users that want to upgrade to Flink 1.6,or supporting both. Is Beam's interaction with Flink such that we can'tsimply have separate targets linking the same Beam code against one orthe other? (I.e. are code changes needed?) If so, we'll probably need aflink-runner-1.5 module, a flink-runner-1.6, and a flink-runner-commonmodule. Or we hope that all users are happy with 1.5 until a certainpoint in time when they all want to simultaneously jump to 1.6 and Beamat the same time. Maybe that's enough in the short term, but longer termwe need a more sustainable solution.

On Thu, Sep 13, 2018 at 7:13 AM Romain Manni-Bucau<[email protected] <mailto:[email protected]>> wrote:


    Hi guys,

    Isnt the issue "only" that beam has this code instead of engines?

    Assuming beam runner facing api is stable - which must be the case
    anyway - and that each engine has its integration (flink-beam
    instead of beam-runners-flink), then this issue disappears by
    construction.

    It also has the advantage to have a better maintenance.

    Side note: this is what happent which arquillian, originally the
    community did all adapters impl then each vendor took it back in
    house to make it better.

    Any way to work in that direction maybe?

    Le jeu. 13 sept. 2018 00:49, Thomas Weise <[email protected]
    <mailto:[email protected]>> a écrit :

        The main problem here is that users are forced to upgrade
        infrastructure to obtain new features in Beam, even when those
        features actually don't require such changes. As an example,
        another update to Flink 1.6.0 was proposed (without supporting
        new functionality in Beam) and we already know that it breaks
        compatibility (again).

        I think that upgrading to a Flink X.Y.0 version isn't a good
        idea to start with. But besides that, if we want to grow
        adoption, then we need to focus on stability and delivering
        improvements to Beam without disrupting users.

        In the specific case, ideally the surface of Flink would be
        backward compatible, allowing us to stick to a minimum version
        and be able to submit pipelines to Flink endpoints of higher
        versions. Some work in that direction is underway (like
        versioning the REST API). FYI, lowest common version is what
        most projects that depend on Hadoop 2.x follow.

        Since Beam with Flink 1.5.x client won't talk to Flink 1.6 and
        there are code changes required to make it compile, we would
        need to come up with a more involved strategy to support
        multiple Flink versions. Till then, I would prefer we favor
        existing users over short lived experiments, which would mean
        stick with 1.5.x and not support 1.6.0.

        Thanks,
        Thomas


        On Wed, Sep 12, 2018 at 1:15 PM Lukasz Cwik <[email protected]
        <mailto:[email protected]>> wrote:

            As others have already suggested, I also believe LTS
            releases is the best we can do as a community right now
            until portability allows us to decouple what a user writes
            with and how it runs (the SDK and the SDK environment) from
            the runner (job service + shared common runner libs +
            Flink/Spark/Dataflow/Apex/Samza/...).

            Dataflow would be highly invested in having the appropriate
            tooling within Apache Beam to support multiple SDK versions
            against a runner. This in turn would allow people to use any
            SDK with any runner and as Robert had mentioned, certain
            optimizations and features would be disabled depending on
            the capabilities of the runner and the capabilities of the SDK.



            On Wed, Sep 12, 2018 at 6:38 AM Robert Bradshaw
            <[email protected] <mailto:[email protected]>> wrote:

                The target audience is people who want to use the latest
                Beam but do not want to use the latest version of the
                runner, right?

                I think this will be somewhat (though not entirely)
                addressed by Beam LTS releases, where those not wanting
                to upgrade the runner at least have a well-supported
                version of Beam. In the long term, we have the division

                     Runner <-> BeamRunnerSpecificCode <->
                CommonBeamRunnerLibs <-> SDK.

                (which applies to the job submission as well as execution).

                Insomuch as the BeamRunnerSpecificCode uses the public
                APIs of the runner, hopefully upgrading the runner for
                minor versions should be a no-op, and we can target the
                lowest version of the runner that makes sense, allowing
                the user to link against higher versions at his or her
                discretion. We should provide built targets that allow
                this. For major versions, it may make sense to have two
                distinct BeamRunnerSpecificCode libraries (which may or
                may not share some common code). I hope these wrappers
                are not too thick.

                There is a tight coupling at the BeamRunnerSpecificCode
                <-> CommonBeamRunnerLibs layer, but hopefully the bulk
                of the code lives on the right hand side and can be
                updated as needed independent of the runner. There may
                be code of the form "if the runner supports X, do this
                fast path, otherwise, do this slow path (or reject the
                pipeline).

                I hope the CommonBeamRunnerLibs <-> SDK coupling is
                fairly loose, to the point that one could use SDKs from
                different versions of Beam (or even developed outside of
                Beam) with an older/newer runner. We may need to add
                versioning to the Fn/Runner/Job API itself to support
                this. Right now of course we're still in a pre-1.0,
                rapid-development phase wrt this API.




                On Wed, Sep 12, 2018 at 2:10 PM Etienne Chauchot
                <[email protected] <mailto:[email protected]>> wrote:

                    Hi Max,

                    I totally agree with your points especially the
                    users priorities (stick to the already working
                    version) , and the need to leverage important new
                    features. It is indeed a difficult balance to find .

                    I can talk for a part I know: for the Spark runner,
                    the aim was to support Dataset native spark API (in
                    place of RDD). For that we needed to upgrade to
                    spark 2.x (and we will probably leverage Beam Row as
                    well).
                    But such an upgrade is a good amount of work which
                    makes it difficult to commit on a schedule such as
                    "if there is a major new feature on an execution
                    engine that we want to leverage, then the upgrade in
                    Beam will be done within x months".

                    Regarding your point on portability : decoupling SDK
                    from runner with runner harness and SDK harness
                    might make pipeline authors work easy regarding
                    pipeline maintenance. But, still, if we upgrade
                    runner libs, then the users might have their runner
                    harness not work with their engine version.
                    If such SDK/runner decoupling is 100% functional,
                    then we could imaging having multiple runner
                    harnesses shipping different versions of the runner
                    libs to solve this problem.
                    But we would need to support more than one version
                    of the runner libs. We chose not to do this on spark
                    runner.

                    WDYT ?

                    Best
                    Etienne


                    Le mardi 11 septembre 2018 à 15:42 +0200, Maximilian
                    Michels a écrit :

                    Hi Beamers,

                    In the light of the discussion about Beam LTS releases, I'd 
like to kick
                    off a thread about how often we upgrade the execution 
engine of each
                    Runner. By upgrade, I mean major/minor versions which 
typically break
                    the binary compatibility of Beam pipelines.

                    For the Flink Runner, we try to track the latest stable 
version. Some
                    users reported that this can be problematic, as it requires 
them to
                    potentially upgrade their Flink cluster with a new version 
of Beam.

                      From a developer's perspective, it makes sense to migrate 
as early as
                    possible to the newest version of the execution engine, 
e.g. to leverage
                    the newest features. From a user's perspective, you don't 
care about the
                    latest features if your use case still works with Beam.

                    We have to please both parties. So I'd suggest to upgrade 
the execution
                    engine whenever necessary (e.g. critical new features, end 
of life of
                    current version). On the other hand, the upcoming Beam LTS 
releases will
                    contain a longer-supported version.

                    Maybe we don't need to discuss much about this but I wanted 
to hear what
                    the community has to say about it. Particularly, I'd be 
interested in
                    how the other Runner authors intend to do it.

                    As far as I understand, with the portability being stable, 
we could
                    theoretically upgrade the SDK without upgrading the runtime 
components.
                    That would allow us to defer the upgrade for a longer time.

                    Best,
                    Max

Re: [Discuss] Upgrade story for Beam's execution engines

Reply via email to