Thanks for your thoughts, David!

Thanks Luke, copybara looks interesting.

Thomas and Max expressed needs for addressing the issue with the current 
release.

I think there is a misunderstanding. Thomas and I expressed the need to review the PR properly before merging it to avoid introducing technical debt. The expectation to open a feature PR a couple days before the release cut and have it, a) reviewed, b) tested, and c) merged, is a lot to ask for the maintainers. We are currently in stage (a) after the first round of reviews.

1.5/1.6 and possibly 1.7 will be removed. This was announced earlier on the mailing list and is tracked here: https://jira.apache.org/jira/browse/BEAM-7962

The amount of duplicated files code is very little, two very simple files. After removing 1.5/1.6/1.7 there should be none left. In the PR it looks like there were unnecessary duplications of files. That's why I commented.

Due to the above reasons, I don't see the need to change the current way of cross-compiling for Flink versions. Out of David proposals, I still think the current approach is the easiest and least error-prone.

Kenn, I appreciate your thoughts. How is your approach different from the current, expect introducing an additional factory method? The source layout is effectively identical to the current. We still have to retain different source directories for each Flink version where Flink internal interfaces change.

Now the code indicates exactly what is going on and all builds are 
straightforward. All dependencies are actual build dependencies, not src dir 
hacks.

Source directories are build dependencies. I can see a minor improvement though in code readability (and that's always great), but still the developer needs to know from which source directory the factory is going to be loaded from, which is very similar to the current approach.

Thanks,
Max

On 07.09.19 20:37, Kenneth Knowles wrote:
What about...

*5) Architect the code for sharing using straightforward builds and dependencies*

Each versioned directory has an implementation of FlinkRunner constructed from the common source, with version-specific pieces injected (not with a complex framework, but by just passing as a parameter).

I just read over the actual amount of stuff you would have to inject. It is very small at least through 1.8. Basically:

     FlinkRunnerBuilder(
       Function<Coder, TypeSerializer> coderTypeSerializerFactory,
      Function<ExecutionConfig, TypeSerializer> encodedValueTypeSerializerFactory).build() // returns a FlinkRunner

Now the code indicates exactly what is going on and all builds are straightforward. All dependencies are actual build dependencies, not src dir hacks.

*flink/*
+ *common/ *# move src/ to common/src/ for hygiene; compiles against all supported Flink versions
    - build.gradle
    + *src/*
       - org/apache/beam/runners/flink/common*/FlinkRunnerBuilder.java*
**- ... # the main runner implementation here
+ *1.5_to_1.7_utils/*/ # common pluggable parts that work for 1.5 through 1.7/    - build.gradle # maybe makes sense to cross-compile and unit test against 1.5 through 1.7, though dirs below give coverage too
+ *src/*
- org/apache/beam/runners/flink/v1_5_to_1_7/*CoderTypeSerializerFactory.java* - org/apache/beam/runners/flink/v1_5_to_1_7/*EncodedValueTypeSerializerFactory.java*
+ *1.5/*
    - build.gradle
   + *src/ */# implementation of FlinkRunner compatible with 1.5, could have some of its own logic plugged in to FlinkRunnerBuilder, and some 1.5_to_1.7 utils/*
*
      - org/apache/beam/runners/flink/*FlinkRunner.java */# FlinkRunnerBuilder(new v1_5_to_1_7.CoderTypeSerializerFactory(), new /v1_5_to_1_7./EncodedValueTypeSerializerFactory())/
+ *1.6/*
    - build.gradle
   + *src/ */# implementation of FlinkRunner compatible with 1.6, actually has none of its own logic but it could/*
*
      - org/apache/beam/runners/flink/*FlinkRunner.java */# FlinkRunnerBuilder(new v1_5_to_1_7.CoderTypeSerializerFactory(), new /v1_5_to_1_7./EncodedValueTypeSerializerFactory())/
+ *1.7/*
   + *src/ */# implementation of FlinkRunner compatible with 1.7, actually has none of its own logic but it could/*
*
      - org/apache/beam/runners/flink/*FlinkRunner.java */# FlinkRunnerBuilder(new v1_5_to_1_7.CoderTypeSerializerFactory(), new /v1_5_to_1_7./EncodedValueTypeSerializerFactory())/
+ *1.8/*
    - build.gradle
+ *src/*
      - org/apache/beam/runners/flink/*FlinkRunner.java */# FlinkRunnerBuilder(new v1_8_CoderTypeSerializerFactory(), new EncodedValueTypeSerializerFactory())/ - org/apache/beam/runners/flink/*v1_8/CoderTypeSerializerFactory.java* - org/apache/beam/runners/flink/*v1_8/EncodedValueTypeSerializerFactory.java*

Kenn

On Sat, Sep 7, 2019 at 9:00 AM Lukasz Cwik <[email protected] <mailto:[email protected]>> wrote:

    When we import the Beam code into Google, we also run into issues
    where sometimes we need to transform parts of the code. During
    import we use copybara[1] to do these transformations to the source
    which are more then just copy file X from some other path since most
    of the time we want to change only a few lines and this really helps
    reduce the maintenance pain. Unfortunately I don't see a Gradle
    plugin for copybara but I do imagine there is a plugin that allows
    one to run SED like expressions or other transformations instead of
    just maintaining duplicate copies of files.

    1: https://github.com/google/copybara

    On Sat, Sep 7, 2019 at 3:37 AM David Morávek <[email protected]
    <mailto:[email protected]>> wrote:

        Hello,

        we currently have an opened PR for Flink 1.9
        <https://github.com/apache/beam/pull/9296>, which greatly
        improves the runner for batch use-case. In case the PR gets
        merged, we would be supporting 5 latest major versions of Flink,
        which obviously come with high maintenance price and makes
        future development harder (there are already a sub-optimal parts
        due to compatibility with previous versions). Thomas and Max
        expressed needs for addressing the issue with the current release.

        Let's break down possible solution for the problem.

        *1) Current solution*
        *
        *
        Currently we maintain separate build for each version. The
        project structure looks as follows:

        *flink/*
        + *1.5/
        *
            + *src/*/# implementation of classes that differ between
        versions/*
        *
            - build.gradle
        + *1.6/*
            + build.gradle #/the version is backward compatible, so it
        can reuse "overrides" from 1.5/
        + *1.7/*
            + build.gradle #/the version is backward compatible, so it
        can reuse "overrides" from 1.5/
        + *1.8/*
        + *src/ */# implementation of classes that differ between versions/
            - build.gradle
        + *1.9/*
        + *src///*/# implementation of classes that differ between versions/
            - build.gradle
        + *src/*/# common source, shared among runner versions
        /
        - flink_runner.gradle/# included by  each <version>/build.gradle
        /

        The problem with this structure is, that we always need to copy
        all of the version specific classes between backward
        incompatible versions, which results in *duplicate files* (we
        can not simply override a single file, because it wouldn't
        compile due to duplicate classes).
        *
        *
        *2) Symlink duplicates*
        *
        *
        Maybe we can simply symlink duplicates between versions and only
        override the files that need to be changed?*
        *

        *3) Adjusting the gradle build*
        *
        *
        Currently a version build looks something like this (this one is
        for 1.7.x version):
        *
        *
        project.ext {
           // Set the version of all Flink-related dependencies here.
           flink_version = '1.7.2'
           // Main source directory and Flink version specific code.
           main_source_dirs = ["$basePath/src/main/java",
        "../1.5/src/main/java"]
           test_source_dirs = ["$basePath/src/test/java",
        "../1.5/src/test/java"]
           main_resources_dirs = ["$basePath/src/main/resources"]
           test_resources_dirs = ["$basePath/src/test/resources"]
           archives_base_name = 'beam-runners-flink-1.7'
        }

        // Load the main build script which contains all build logic.
        apply from: "$basePath/flink_runner.gradle"

        It basically says, take the common source and append version
        specific implementations from 1.5 version. Let's say we want to
        override a single file for 1.8. We need to copy everything from
        1.5/src and the build file would look as follows:

        /* All properties required for loading the Flink build script */
        project.ext {
           // Set the version of all Flink-related dependencies here.
           flink_version = '1.8.0'
           // Main source directory and Flink version specific code.
           main_source_dirs = ["$basePath/src/main/java", "./src/main/java"]
           test_source_dirs = ["$basePath/src/test/java", "./src/test/java"]
           main_resources_dirs = ["$basePath/src/main/resources"]
           test_resources_dirs = ["$basePath/src/test/resources"]
           archives_base_name = 'beam-runners-flink-1.8'
        }

        // Load the main build script which contains all build logic.
        apply from: "$basePath/flink_runner.gradle"

        For simplicity, let's only focus on *main_source_dirs*. What we
        really want to do is to tell the build, to use everything from
        1.5 and override a single class (e.g. CoderTypeSerializer).

        def copyOverrides = tasks.register('copyOverrides', Copy) {
           it.from '../1.5/src/', './src'
           it.into "${project.buildDir}/flink-overrides/src"
           it.duplicatesStrategy DuplicatesStrategy.INCLUDE // The last
        duplicate file 'wins'.
        }

        compileJava.dependsOn copyOverrides

        projext.ext {
           main_source_dirs = ["$basePath/src/main/java",
        "${project.buildDir}/flink-overrides/src/main/java"]
        }

        This would copy all overrides into build directory, and it case
        of duplicate it picks the latest one. Than the build would
        simple compile classes from the newly created java files in
        build directory.

        *4) Maintaining last 3 major versions only*

        I recall that Flink community only supports 3 latest major
        versions <https://flink.apache.org/downloads.html> (please
        correct me if I'm mistaken). I suggest the the*Beam would do the
        same*. There is already an opened BEAM-7962
        <https://jira.apache.org/jira/browse/BEAM-7962> that suggest
        dropping 1.5 & 1.6 versions. Maybe this would allow us to keep
        the current structure with bearable amount of technical debt?

        Personally I'm in favor of *4)* combined with *3)*.

        What do you think? Do you have any other suggestions how to
        solve this?

        Thanks,
        D.

Reply via email to