Re: [DISCUSS] Generating java code?

Chesnay Schepler Tue, 21 Apr 2020 05:00:42 -0700

I've had an offline discussion with Till and we came to the followingconclusion:

All out-lined approaches work perfectly when Flink is built in it'sentirety with maven.None of the out-lined approach work perfectly when Flink is _not_ builtin it's entirety.

The source-file generation approach has similar problems as theproperties-file one; if I build flink-runtime, switch to another branch,build flink-dist, then the information that will be displayed in the UIwhen I start the cluster may be incorrect.Put another way, you can still easily run into the issue of "doesn'twork on /my/ machine", the solution to which always is to rebuilt Flinkcompletely.Frustratingly enough, with both approaches you can even run into theissue where the information _seems_ correct, when it actually isn't; butthis isn't something that we can really solve.

As such, this is no longer really about correctness, but one aboutconvenience and maintenance overhead.

As for maintenance, I think a good argument can be made that thegeneration approach would be easier to maintain, in parts because it isthe more standard approach for this kind of thing.

The properties-file approach however clearly wins in convenience as itneither requires an additional command to be run and doesn't causecompile errors. The compile error is quite a problem indeed since itprovides no information as to what the actual problem is; or rather howto solve it.

Overall we concluded that we would like to stick to the existingapproach of having a properties file (or JSON or whatever).


I saw that you updated the PR, and will take a look shortly.

On a personal note, I'm sympathetic to your idea, but people are quitesensitive to things interrupting their work-flow (which I ran into a fewtimes already :/ ), so we put a lot of emphasis on that aspect.


On 16/04/2020 16:05, Niels Basjes wrote:

Hi,

Apparently the IDEs (like IntelliJ) are imperfect at supporting the DEVs in
these kinds of use cases.
The solutions you see are like you have in IntelliJ: a "generate-sources"
button.
It is a way of working I see in almost all projects I'm involved with:
Almost all either use a parser generator (like javacc or Antlr) or generate
code from a specification (like Avro or Protobuf).
In all of these cases there is no "default" value you can use so this
discussion never arose in those projects.

All other options (other than 'hard' generating a file/class without having
defaults) involve having some kind of "default" (regardless if it is a
property file, a Json file or a Java interface).
This will lead to the illusion of "it works on my machine" or "why doesn't
it work on my machine".
What I also see is that these options which include "defaults" also have
quite a bit of extra code to handle all the situations related to loading
the settings (either through resource loading or classloading).

The goal I was working on is that the docker image that is loaded is tagged
with a flink+scala version.
If the DEV does not generate the right values or has values from a
different branch then a different docker image may be used then what is
expected leading to "why doesn't it work on my machine" .

So yes it is an additional learning step for the DEVs and yes it takes a
bit of getting used to.
Yet I truly believe it is the best direction.

Niels


On Wed, Apr 15, 2020 at 5:26 PM Till Rohrmann <trohrm...@apache.org> wrote:

I'm not advocating for a specific approach. The point I wanted to make is
that there are solutions which allow us to get rid of the problematic
parsing and not disrupting the workflow. If the Jackson JSON file approach
works for Niels, then I'm fine with that as well.

Cheers,
Till

On Wed, Apr 15, 2020 at 5:21 PM Chesnay Schepler <ches...@apache.org>
wrote:

It doesn't have to be a properties file, nor do we necessarily have to
do any manual parsing.
It could just be a JSON file that we point Jackson at. Naturally we
could also generate it with Jackson.
You'd have a POJO for all the fields with sane defaults (an analogue to
the proposed generated POJO) and 5-6 lines at most for loading the file
and handling errors.

I don't think this would be such a problem.

If I understand Till correctly he's proposing a service-loader approach,
which seems overkill to me. It also introduces some inconsistency in
what one needs to do before working on stuff which I'm apprehensive

about.

On 15/04/2020 16:31, Till Rohrmann wrote:

Hi everyone,

thanks for starting this discussion Niels.

I like the idea of getting rid of parsing a Properties instance. On the
other hand, I also understand that people are concerned about

disrupting

the workflows of our devs.

Maybe we can find a compromise between both approaches. For example,

one

could define an EnvironmentInformationProvider interface which is

loaded

at

runtime. The implementation of this interface could be generated by

`mvn

generate-sources`. Moreover, if we cannot find it, then we fail with a
descriptive exception saying to run `mvn generate-sources`. That way

only

devs who rely on the EnvironmentInformation have to run `mvn
generate-sources` when testing/developing and we could still get rid of

all

the parsing and logic of a properties file. What do you think?

Cheers,
Till

On Mon, Apr 13, 2020 at 10:53 AM Niels Basjes <ni...@basjes.nl> wrote:

Hi,

On Mon, Apr 13, 2020 at 8:30 AM Yang Wang <danrtsey...@gmail.com>

wrote:

Although the latter option is more stable,
i think it is not acceptable for all the developers to execute `mvn
generate-sources` first.

Otherwise, the Flink project is just broken and could not run tests,

Flink

jobs in IDE.

It is important to realize that this discussion is essentially around

the

fact that systems like IntelliJ (what I use) execute various ways of
generating code that have been written in a maven pom.xml in different
ways. Simply put: They don't execute all instructions that have been
defined in the pom.xml automatically when needed.

     - With the properties file variant (which uses resource filtering)
     IntelliJ does resource filtering automatically yet no maven

plugins

are

     run that affect the properties that are replaced.
     So the git variables are NOT populated when running in IntelliJ

and

     have seen several values in the propersites file are nonsense.
     As a consequence the Java code reading those needs to catch things

like

     "missing properties file" and various kinds of "nonsense values"

and

     replace them with a workable default.
     So when running this code you are actually running something

different

     from what will be run when doing a `mvn clean install`, yet the
developer
     is under the illusion it all works because of the code that

catches

all

of
     those problems.
     Depending on the way these variables are used this may lead to "It

fails

     in production but it works fine in IntelliJ" situations.
     In my mind the code that catches all of those exceptional

situations in

     poorly generated properties only exists so that developers can run

the

     (otherwise perfectly fine) software in a "broken"

build/development

     environment.


     - With the way I propose to generate the code the effect is indeed
     slightly harder: If you do not run the pom.xml based code

generation it

     will not work.
     I understand that this requires the developers to think a bit more

about

     the code they are working on.
     The upside is that the code either works perfectly or does not

compile.

     There is no "it compiles but is really nonsense".
     I prefer this.
     Also at this point in time this is already true for Flink: There

are

     quite a few modules where the developer MUST run mvn

generate-sources

for
     all tests to run successfully.
     Like I indicated there is a javacc SQL parser and there are a lot

of

     tests that rely on generating Avro code before they are able to

run.

     I have several projects where systems like Avro and Antlr force me

in

     this direction, there is simply no other way to do this.

So i think the version properties file is enough for now. +1 for the

first

option.

Like I said. I'm fine with either choice by the committers.
I would choose the `mvn generate-sources` code variant as it is much
simpler and much more reliable.

Niels

Niels Basjes <ni...@basjes.nl> 于2020年4月9日周四 下午4:47写道：

Hi,

I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to

make

more build time variables (like the scala version) into the code

available

at runtime.

During the review process there was discussion around a basic

question:

*Is

generating java code during the build ok?*
See

     -

https://github.com/apache/flink/pull/11245#discussion_r400035133

     - https://github.com/apache/flink/pull/11592
     -

https://github.com/apache/flink/pull/11592#issuecomment-610282450

As suggested by Chesnay Schepler I'm putting this question to the

mailing

list.

https://github.com/apache/flink/pull/11592#issuecomment-610963947

The main discussion was around the ease of use when running in an

IDE

like

IntelliJ.

So essentially we have two solution directions available:

     1. *Generate a properties file and then use the classloader to

load

this

     file as a resource and then parse it as a property file.*
     This is the currently used solution direction for this part of

the

code.

     A rough version of this (to be improved) :

https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e

     This method has several effects:
        1. The developer can run the project immediately from within

the

IDE

        as fallback values are provided if the 'actual' values are

missing.

        2. This property file (with stuff that should never be

overwritten)

        can be modified by placing a different one in the classpath.

In

fact it IS
        modified in the flink-dist as it generates a new file with

the

same

name
        into the binary distribution (I consider this to be bad).
        3. Loading resources means loading, parsing and a lot of

error

        handling. Lots of things "can be null" or  be a default

value.

So

the

        values are unreliable and lots of code needs to handle this.

In

fact

when
        running from IntelliJ the properties file is generated poorly

most

of the
        time, only during a normal maven build will it work

correctly.

     2. *Generate a Java source file and then simply compile this and

make

it

     part of the project.*
     Essentially the same model as you would have when using Apache

Avro,

     Protobuf, Antlr 4 and javacc (several of those are used in

Flink!).

     A rough version of this (to be improved) :

https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566

     This method has several effects:
     1. The developer MUST run 'mvn generate-sources' before the

actual

the

        project immediately from within the IDE as fallback values

are

provided if
        the 'actual' values are missing.
        2. The code/test will not run until this step is done.
        3. Because the file is generated by a plugin it is always

correct.

As

        a consequence all variables are always available and the

downstream

users
        no longer have to handle the "can be null" or "default value"
situations.

So is generating code similar to what I created a desired change?
My opinion is that it is the better solution, the data available is

more

reliable and as a consequence the rest of the code is simpler.
It would probably mean that during development of flink you should

be

aware

of this and do an 'extra step' to get it running.

What do you guys think?

--
Best regards / Met vriendelijke groeten,

Niels Basjes

--
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: [DISCUSS] Generating java code?

Reply via email to