Thanks for suggestions! If there are no objections I'll start with removing
<https://issues.apache.org/jira/browse/BEAM-7962> 1.5 & 1.6 versions.

Kenn: I really like your approach! ;) I'll definitely try it after the
removal.

D.

On Sun, Sep 8, 2019 at 8:38 PM Kenneth Knowles <[email protected]> wrote:

> In my suggestion, there is no hacking around with source sets at all.
> There is no significant logic in the gradle configuration, which is a goal
> IMO.
>
> On Sun, Sep 8, 2019 at 3:15 AM Maximilian Michels <[email protected]> wrote:
>
>> One minor addition, the "*1.5_to_1.7_utils" directory you listed, is
>> already present as the "1.5" directory. Admittedly, it could be renamed to
>> make that clearer.
>>
>> *
>> On 08.09.19 10:06, Maximilian Michels wrote:
>> > Thanks for your thoughts, David!
>> >
>> > Thanks Luke, copybara looks interesting.
>> >
>> > > Thomas and Max expressed needs for addressing the issue with the
>> > > current release.
>> >
>> > I think there is a misunderstanding. Thomas and I expressed the need to
>> > review the PR properly before merging it to avoid introducing technical
>> > debt. The expectation to open a feature PR a couple days before the
>> > release cut and have it, a) reviewed, b) tested, and c) merged, is a lot
>> > to ask for the maintainers. We are currently in stage (a) after the
>> > first round of reviews.
>> >
>> > 1.5/1.6 and possibly 1.7 will be removed. This was announced earlier on
>> > the mailing list and is tracked here:
>> > https://jira.apache.org/jira/browse/BEAM-7962
>> >
>> > The amount of duplicated files code is very little, two very simple
>> > files. After removing 1.5/1.6/1.7 there should be none left. In the PR
>> > it looks like there were unnecessary duplications of files. That's why I
>> > commented.
>> >
>> > Due to the above reasons, I don't see the need to change the current way
>> > of cross-compiling for Flink versions. Out of David proposals, I still
>> > think the current approach is the easiest and least error-prone.
>> >
>> > Kenn, I appreciate your thoughts. How is your approach different from
>> > the current, expect introducing an additional factory method? The source
>> > layout is effectively identical to the current. We still have to retain
>> > different source directories for each Flink version where Flink internal
>> > interfaces change.
>> >
>> > > Now the code indicates exactly what is going on and all builds are
>> > > straightforward. All dependencies are actual build dependencies, not
>> > > src dir hacks.
>> >
>> > Source directories are build dependencies. I can see a minor improvement
>> > though in code readability (and that's always great), but still the
>> > developer needs to know from which source directory the factory is going
>> > to be loaded from, which is very similar to the current approach.
>> >
>> > Thanks,
>> > Max
>> >
>> > On 07.09.19 20:37, Kenneth Knowles wrote:
>> > > What about...
>> > >
>> > > *5) Architect the code for sharing using straightforward builds and
>> > > dependencies*
>> > >
>> > > Each versioned directory has an implementation of FlinkRunner
>> > > constructed from the common source, with version-specific pieces
>> > > injected (not with a complex framework, but by just passing as a
>> > > parameter).
>> > >
>> > > I just read over the actual amount of stuff you would have to inject.
>> > > It is very small at least through 1.8. Basically:
>> > >
>> > >      FlinkRunnerBuilder(
>> > >        Function<Coder, TypeSerializer> coderTypeSerializerFactory,
>> > >        Function<ExecutionConfig, TypeSerializer>
>> > > encodedValueTypeSerializerFactory).build() // returns a FlinkRunner
>> > >
>> > > Now the code indicates exactly what is going on and all builds are
>> > > straightforward. All dependencies are actual build dependencies, not
>> > > src dir hacks.
>> > >
>> > > *flink/*
>> > > + *common/ *# move src/ to common/src/ for hygiene; compiles against
>> > > all supported Flink versions
>> > >     - build.gradle
>> > >     + *src/*
>> > >        -
>> org/apache/beam/runners/flink/common*/FlinkRunnerBuilder.java*
>> > > **- ... # the main runner implementation here
>> > > + *1.5_to_1.7_utils/*/ # common pluggable parts that work for 1.5
>> > > through 1.7/
>> > >     - build.gradle # maybe makes sense to cross-compile and unit test
>> > > against 1.5 through 1.7, though dirs below give coverage too
>> > > + *src/*
>> > >
>> - org/apache/beam/runners/flink/v1_5_to_1_7/*CoderTypeSerializerFactory.java*
>> > >
>> > >
>> - 
>> org/apache/beam/runners/flink/v1_5_to_1_7/*EncodedValueTypeSerializerFactory.java*
>> > >
>> > > + *1.5/*
>> > >     - build.gradle
>> > >     + *src/ */# implementation of FlinkRunner compatible with 1.5,
>> > > could have some of its own logic plugged in to FlinkRunnerBuilder, and
>> > > some 1.5_to_1.7 utils/*
>> > > *
>> > >        - org/apache/beam/runners/flink/*FlinkRunner.java */#
>> > > FlinkRunnerBuilder(new v1_5_to_1_7.CoderTypeSerializerFactory(), new
>> > > /v1_5_to_1_7./EncodedValueTypeSerializerFactory())/
>> > > + *1.6/*
>> > >     - build.gradle
>> > >     + *src/ */# implementation of FlinkRunner compatible with 1.6,
>> > > actually has none of its own logic but it could/*
>> > > *
>> > >        - org/apache/beam/runners/flink/*FlinkRunner.java */#
>> > > FlinkRunnerBuilder(new v1_5_to_1_7.CoderTypeSerializerFactory(), new
>> > > /v1_5_to_1_7./EncodedValueTypeSerializerFactory())/
>> > > + *1.7/*
>> > >     + *src/ */# implementation of FlinkRunner compatible with 1.7,
>> > > actually has none of its own logic but it could/*
>> > > *
>> > >        - org/apache/beam/runners/flink/*FlinkRunner.java */#
>> > > FlinkRunnerBuilder(new v1_5_to_1_7.CoderTypeSerializerFactory(), new
>> > > /v1_5_to_1_7./EncodedValueTypeSerializerFactory())/
>> > > + *1.8/*
>> > >     - build.gradle
>> > > + *src/*
>> > >        - org/apache/beam/runners/flink/*FlinkRunner.java */#
>> > > FlinkRunnerBuilder(new v1_8_CoderTypeSerializerFactory(), new
>> > > EncodedValueTypeSerializerFactory())/
>> > > - org/apache/beam/runners/flink/*v1_8/CoderTypeSerializerFactory.java*
>> > >
>> - org/apache/beam/runners/flink/*v1_8/EncodedValueTypeSerializerFactory.java*
>> > >
>> > >
>> > > Kenn
>> > >
>> > > On Sat, Sep 7, 2019 at 9:00 AM Lukasz Cwik <[email protected]
>> > > <mailto:[email protected]>> wrote:
>> > >
>> > >     When we import the Beam code into Google, we also run into issues
>> > >     where sometimes we need to transform parts of the code. During
>> > >     import we use copybara[1] to do these transformations to the
>> source
>> > >     which are more then just copy file X from some other path since
>> most
>> > >     of the time we want to change only a few lines and this really
>> helps
>> > >     reduce the maintenance pain. Unfortunately I don't see a Gradle
>> > >     plugin for copybara but I do imagine there is a plugin that allows
>> > >     one to run SED like expressions or other transformations instead
>> of
>> > >     just maintaining duplicate copies of files.
>> > >
>> > >     1: https://github.com/google/copybara
>> > >
>> > >     On Sat, Sep 7, 2019 at 3:37 AM David Morávek <[email protected]
>> > >     <mailto:[email protected]>> wrote:
>> > >
>> > >         Hello,
>> > >
>> > >         we currently have an opened PR for Flink 1.9
>> > >         <https://github.com/apache/beam/pull/9296>, which greatly
>> > >         improves the runner for batch use-case. In case the PR gets
>> > >         merged, we would be supporting 5 latest major versions of
>> Flink,
>> > >         which obviously come with high maintenance price and makes
>> > >         future development harder (there are already a sub-optimal
>> parts
>> > >         due to compatibility with previous versions). Thomas and Max
>> > >         expressed needs for addressing the issue with the current
>> > > release.
>> > >
>> > >         Let's break down possible solution for the problem.
>> > >
>> > >         *1) Current solution*
>> > >         *
>> > >         *
>> > >         Currently we maintain separate build for each version. The
>> > >         project structure looks as follows:
>> > >
>> > >         *flink/*
>> > >         + *1.5/
>> > >         *
>> > >             + *src/*/# implementation of classes that differ between
>> > >         versions/*
>> > >         *
>> > >             - build.gradle
>> > >         + *1.6/*
>> > >             + build.gradle #/the version is backward compatible, so it
>> > >         can reuse "overrides" from 1.5/
>> > >         + *1.7/*
>> > >             + build.gradle #/the version is backward compatible, so it
>> > >         can reuse "overrides" from 1.5/
>> > >         + *1.8/*
>> > >         + *src/ */# implementation of classes that differ between
>> > > versions/
>> > >             - build.gradle
>> > >         + *1.9/*
>> > >         + *src///*/# implementation of classes that differ between
>> > > versions/
>> > >             - build.gradle
>> > >         + *src/*/# common source, shared among runner versions
>> > >         /
>> > >         - flink_runner.gradle/# included by  each
>> <version>/build.gradle
>> > >         /
>> > >
>> > >         The problem with this structure is, that we always need to
>> copy
>> > >         all of the version specific classes between backward
>> > >         incompatible versions, which results in *duplicate files* (we
>> > >         can not simply override a single file, because it wouldn't
>> > >         compile due to duplicate classes).
>> > >         *
>> > >         *
>> > >         *2) Symlink duplicates*
>> > >         *
>> > >         *
>> > >         Maybe we can simply symlink duplicates between versions and
>> only
>> > >         override the files that need to be changed?*
>> > >         *
>> > >
>> > >         *3) Adjusting the gradle build*
>> > >         *
>> > >         *
>> > >         Currently a version build looks something like this (this one
>> is
>> > >         for 1.7.x version):
>> > >         *
>> > >         *
>> > >         project.ext {
>> > >            // Set the version of all Flink-related dependencies here.
>> > >            flink_version = '1.7.2'
>> > >            // Main source directory and Flink version specific code.
>> > >            main_source_dirs = ["$basePath/src/main/java",
>> > >         "../1.5/src/main/java"]
>> > >            test_source_dirs = ["$basePath/src/test/java",
>> > >         "../1.5/src/test/java"]
>> > >            main_resources_dirs = ["$basePath/src/main/resources"]
>> > >            test_resources_dirs = ["$basePath/src/test/resources"]
>> > >            archives_base_name = 'beam-runners-flink-1.7'
>> > >         }
>> > >
>> > >         // Load the main build script which contains all build logic.
>> > >         apply from: "$basePath/flink_runner.gradle"
>> > >
>> > >         It basically says, take the common source and append version
>> > >         specific implementations from 1.5 version. Let's say we want
>> to
>> > >         override a single file for 1.8. We need to copy everything
>> from
>> > >         1.5/src and the build file would look as follows:
>> > >
>> > >         /* All properties required for loading the Flink build script
>> */
>> > >         project.ext {
>> > >            // Set the version of all Flink-related dependencies here.
>> > >            flink_version = '1.8.0'
>> > >            // Main source directory and Flink version specific code.
>> > >            main_source_dirs = ["$basePath/src/main/java",
>> > > "./src/main/java"]
>> > >            test_source_dirs = ["$basePath/src/test/java",
>> > > "./src/test/java"]
>> > >            main_resources_dirs = ["$basePath/src/main/resources"]
>> > >            test_resources_dirs = ["$basePath/src/test/resources"]
>> > >            archives_base_name = 'beam-runners-flink-1.8'
>> > >         }
>> > >
>> > >         // Load the main build script which contains all build logic.
>> > >         apply from: "$basePath/flink_runner.gradle"
>> > >
>> > >         For simplicity, let's only focus on *main_source_dirs*. What
>> we
>> > >         really want to do is to tell the build, to use everything from
>> > >         1.5 and override a single class (e.g. CoderTypeSerializer).
>> > >
>> > >         def copyOverrides = tasks.register('copyOverrides', Copy) {
>> > >            it.from '../1.5/src/', './src'
>> > >            it.into "${project.buildDir}/flink-overrides/src"
>> > >            it.duplicatesStrategy DuplicatesStrategy.INCLUDE // The
>> last
>> > >         duplicate file 'wins'.
>> > >         }
>> > >
>> > >         compileJava.dependsOn copyOverrides
>> > >
>> > >         projext.ext {
>> > >            main_source_dirs = ["$basePath/src/main/java",
>> > >         "${project.buildDir}/flink-overrides/src/main/java"]
>> > >         }
>> > >
>> > >         This would copy all overrides into build directory, and it
>> case
>> > >         of duplicate it picks the latest one. Than the build would
>> > >         simple compile classes from the newly created java files in
>> > >         build directory.
>> > >
>> > >         *4) Maintaining last 3 major versions only*
>> > >
>> > >         I recall that Flink community only supports 3 latest major
>> > >         versions <https://flink.apache.org/downloads.html> (please
>> > >         correct me if I'm mistaken). I suggest the the*Beam would do
>> the
>> > >         same*. There is already an opened BEAM-7962
>> > >         <https://jira.apache.org/jira/browse/BEAM-7962> that suggest
>> > >         dropping 1.5 & 1.6 versions. Maybe this would allow us to keep
>> > >         the current structure with bearable amount of technical debt?
>> > >
>> > >         Personally I'm in favor of *4)* combined with *3)*.
>> > >
>> > >         What do you think? Do you have any other suggestions how to
>> > >         solve this?
>> > >
>> > >         Thanks,
>> > >         D.
>> > >
>>
>>

Reply via email to