Re: [DISCUSS] Project build time and possible restructuring

Timo Walther Mon, 20 Mar 2017 05:48:49 -0700

I agress with Aljoscha that we might consider moving from Jenkins toTravis. Is there any disadvantage in using Jenkins?

I think we should structure the project according to release management(e.g. more frequent releases of libraries) or other criteria (e.g. coreand non-core) instead of build time. What would happen if the built ofanother submodule would become too long, would we split/restructureagain and again? If Jenkins solves all our problems we should use it.


Regards,
Timo



Am 20/03/17 um 12:21 schrieb Aljoscha Krettek:

I prefer Jenkins to Travis by far. Working on Beam, where we have good Jenkins 
integration, has opened my eyes to what is possible with good CI integration.

For example, look at this recent Beam PR: https://github.com/apache/beam/pull/2263 
<https://github.com/apache/beam/pull/2263>. The Jenkins-Github integration will 
tell you exactly which tests failed and if you click on the links you can look at the 
log output/std out of the tests in question.

This is the overview page of one of the Jenkins Jobs that we have in Beam: 
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/ 
<https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/>. This is an 
example of a stable build: 
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastStableBuild/ 
<https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastStableBuild/>.
 Notice how it gives you fine grained information about the Maven run. This is an unstable run: 
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastUnstableBuild/ 
<https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastUnstableBuild/>.
 There you can see which tests failed and you can easily drill down.

Best,
Aljoscha

On 20 Mar 2017, at 11:46, Robert Metzger <rmetz...@apache.org> wrote:

Thank you for looking into the build times.

I didn't know that the build time situation is so bad. Even with yarn, mesos, 
connectors and libraries removed, we are still running into the build timeout :(

Aljoscha told me that the Beam community is using Jenkins for running the 
tests, and they are planning to completely move away from Travis. I wonder 
whether we should do the same, as having our own Jenkins servers would allow us 
to run tests for more than 50 minutes.

I agree with Stephan that we should keep the yarn and mesos tests in the core 
for stability / testing quality purposes.


On Mon, Mar 20, 2017 at 11:27 AM, Stephan Ewen <se...@apache.org 
<mailto:se...@apache.org>> wrote:
@Greg

I am personally in favor of splitting "connectors" and "contrib" out as
well. I know that @rmetzger has some reservations about the connectors, but
we may be able to convince him.

For the cluster tests (yarn / mesos) - in the past there were many cases
where these tests caught cases that other tests did not, because they are
the only tests that actually use the "flink-dist.jar" and thus discover
many dependency and configuration issues. For that reason, my feeling would
be that they are valuable in the core repository.

I would actually suggest to do only the library split initially, to see
what the challenges are in setting up the multi-repo build and release
tooling. Once we gathered experience there, we can probably easily see what
else we can split out.

Stephan


On Fri, Mar 17, 2017 at 8:37 PM, Greg Hogan <c...@greghogan.com 
<mailto:c...@greghogan.com>> wrote:

I’d like to use this refactoring opportunity to unspilt the Travis tests.
With 51 builds queued up for the weekend (some of which may fail or have
been force pushed) we are at the limit of the number of contributions we
can process. Fixing this requires 1) splitting the project, 2)
investigating speedups for long-running tests, and 3) staying cognizant of
test performance when accepting new code.

I’d like to add one to Stephan’s list of module group. I like that the
modules are generic (“libraries”) so that no one module is alone and
independent.

Flink has three “libraries”: cep, ml, and gelly.

“connectors” is a hotspot due to the long-running Kafka tests (and
connectors for three Kafka versions).

Both flink-storm and flink-python have a modest number of number of tests
and could live with the miscellaneous modules in “contrib”.

The YARN tests are long-running and problematic (I am unable to
successfully run these locally). A “cluster” module could host flink-mesos,
flink-yarn, and flink-yarn-tests.

That gets us close to running all tests in a single Travis build.
   https://travis-ci.org/greghogan/flink/builds/212122590 
<https://travis-ci.org/greghogan/flink/builds/212122590> <
https://travis-ci.org/greghogan/flink/builds/212122590 
<https://travis-ci.org/greghogan/flink/builds/212122590>>

I also tested (https://github.com/greghogan/flink/commits/core_build 
<https://github.com/greghogan/flink/commits/core_build> <
https://github.com/greghogan/flink/commits/core_build 
<https://github.com/greghogan/flink/commits/core_build>>) with a maven
parallelism of 2 and 4, with the latter a 6.4% drop in build time.
   https://travis-ci.org/greghogan/flink/builds/212137659 
<https://travis-ci.org/greghogan/flink/builds/212137659> <
https://travis-ci.org/greghogan/flink/builds/212137659 
<https://travis-ci.org/greghogan/flink/builds/212137659>>
   https://travis-ci.org/greghogan/flink/builds/212154470 
<https://travis-ci.org/greghogan/flink/builds/212154470> <
https://travis-ci.org/greghogan/flink/builds/212154470 
<https://travis-ci.org/greghogan/flink/builds/212154470>>

We can run Travis CI builds nightly to guard against breaking changes.

I also wanted to get an idea of how disruptive it would be to developers
to divide the project into multiple git repos. I wrote a simple python
script and configured it with the module partitions listed above. The usage
string from the top of the file lists commits with files from multiple
partitions and well as the modified files.
   https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897 
<https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897> <
https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897 
<https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897>>

Accounting for the merging of the batch and streaming connector modules,
and assuming that the project structure has not changed much over the past
15 months, for the following date ranges the listed number of commits would
have been split across repositories.

since "2017-01-01"
36 of 571 commits were mixed

since "2016-07-01"
155 of 1607 commits were mixed

since "2016-01-01"
272 of 2561 commits were mixed

Greg

On Mar 15, 2017, at 1:13 PM, Stephan Ewen <se...@apache.org 
<mailto:se...@apache.org>> wrote:

@Robert - I think once we know that a separate git repo works well, and
that it actually solves problems, I see no reason to not create a
connectors repository later. The infrastructure changes should be

identical

for two or more repositories.

On Wed, Mar 15, 2017 at 5:22 PM, Till Rohrmann <trohrm...@apache.org 
<mailto:trohrm...@apache.org>>

wrote:

I think it should not be at least the flink-dist but exactly the

remaining

flink-dist module. Otherwise we do redundant work.

On Wed, Mar 15, 2017 at 5:03 PM, Robert Metzger <rmetz...@apache.org 
<mailto:rmetz...@apache.org>>
wrote:

"flink-core" means the main repository, not the "flink-core" module.

When doing a release, we need to build the flink main code first,

because

the flink-libraries depend on that.
Once the "flink-libraries" are build, we need to run the main build

again

(at least the flink-dist module), so that it is pulling the artifacts

from

the flink-libraries to put them into the opt/ folder of the final

artifact.




On Wed, Mar 15, 2017 at 4:44 PM, Till Rohrmann <trohrm...@apache.org 
<mailto:trohrm...@apache.org>>
wrote:

I'm ok with point 3.

Concerning point 8: Why do we have to build flink-core twice after

having

it built as a dependency for flink-libraries? This seems wrong to me.

Cheers,
Till

On Wed, Mar 15, 2017 at 4:23 PM, Robert Metzger <rmetz...@apache.org 
<mailto:rmetz...@apache.org>>
wrote:

Thank you. Running on AWS is a good idea!
Let me know if you (or anybody else) wants to help me with the
infrastructure work! Any help is much appreciated (as I've said

before, I

don't really have time for doing this, but it has to be done :) )

I'm against creating two new repositories. I fear that this

introduces

too

much complexity and too many repositories.
"flink" and "flink-libraries" are hopefully enough to get the build

time

significantly down.
We can also consider putting the connectors into the

"flink-libraries"

repo

if we need to further reduce the build time.

We should probably move "flink-table" of out "flink-libraries" if we

want

to keep "flink-table" in the main repo. (This would eliminate the
"flink-libraries" module from main.

Also, I agree that "flink-statebackend-rocksdb" is not correctly

placed

in

contrib anymore.


On Wed, Mar 15, 2017 at 4:07 PM, Greg Hogan <c...@greghogan.com 
<mailto:c...@greghogan.com>>

wrote:

Robert, appreciate your kickstarting this task.

We should compare the verification time with and without the listed
modules. I’ll try to run this by tomorrow on AWS and on Travis.

Should we maintain separate repos for flink-contrib and

flink-libraries?

Are you intending that we move flink-table out of flink-libraries

(and

perhaps flink-statebackend-rocksdb out of flink-contrib)?

Greg

On Mar 15, 2017, at 9:55 AM, Robert Metzger <rmetz...@apache.org 
<mailto:rmetz...@apache.org>

wrote:

Thank you for looking into this Till.

I think we then have to split the repositories.
My main motivation for doing this is that it seems to be the only

feasible

way of scaling the community to allow more committers working on

the

libraries.

I'll take care of getting things started.

As the next steps I propose to:
1. Ask INFRA to rename https://git-wip-us.apache.org/ 
<https://git-wip-us.apache.org/>

repos/asf?p=flink-

connectors.git;a=summary to "flink-libraries"
2. Ask INFRA to set up GitHub and travis integration for

"flink-libraries"

3. Put the code of "flink-ml", "flink-gelly", "flink-python",

"flink-cep",

"flink-scala-shell", "flink-storm" into the new repository. (I

decided

against moving flink-contrib there, because rocksdb is in the

contrib

module, for flink-table, I'm undecided, but I kept it in the main

repo

because its probably going to interact more with the core code in

the

future)
I try to preserve the history of those modules when splitting

them

into

the

new repo
4. I'll close all pull requests against those modules in the main

repo.

5. I'll set up a minimal documentation page for the library

repository,

similar to the main documentation.
6. I'll update the documentation build process to build both

documentations

& link them to each other
7. I'll update the nightly deployment process to include both

repositories

8. I'll update the release script to create the Flink release out

of

both

repositories. In order to put the libraries into the opt/ dir of

the

release, I'll need to change the build of "flink-dist" so that it

first

builds flink core, then the libraries and then the core again

with

the

libraries as an additional dependency.

The main question for the community is: do you agree with point

3 ?

Would

you like to include more or less?

I'll start with 1. and 2. tomorrow morning.



On Wed, Mar 15, 2017 at 1:48 PM, Till Rohrmann <

trohrm...@apache.org <mailto:trohrm...@apache.org>

wrote:

In theory we could have a merging bot which solves the problem

of

the

"commit window". Once the PR passes all tests and has enough

+1s,

the

bot

could do the merging and, thus, it effectively linearizes the

merge

process.

I think the second point is actually a disadvantage because

there

is

not

such an immediate incentive/pressure to fix the broken module if

it

lives

in a separate repository. Furthermore, breaking API changes in

the

core

will most likely go unnoticed for some time in other modules

which

are

not

developed so actively. In the worst case these things will only

be

noticed

when we try to make a release.

But I also agree that we are not Google and we don't have the

capacities to

maintain such a smooth a build process that we can keep all the

code

in

single repository.

I looked a bit into Gradle and as far as I can tell it offers

some

nice

features wrt incrementally building projects. This would be

beneficial

for

local development but it would not solve our build time problems

on

Travis.

Gradle intends to introduce a task result cache which allows to

reuse

results across builds. This could help when building on Travis,

however, it

is not yet fully implemented. Moreover, migrating from Maven to

Gradle

won't come for free (there's simply no free lunch out there) and

we

might

risk to introduce new bugs. Therefore, I would vote to split the

repository

in order to mitigate our current problems with Travis and the

build

time in

general. Whether to use a different build system or not can then

be

discussed as an orthogonal question.

Cheers,
Till

On Tue, Mar 14, 2017 at 8:05 PM, Stephan Ewen <se...@apache.org 
<mailto:se...@apache.org>

wrote:

Some other thoughts on how repository split would help. I am

not

sure

for

all of them, so please comment:

- There is less competition for a "commit window". It happens

lot

already that you run all tests and want to commit, but there

was

commit

in the meantime. You rebase, need to re-test, again commit in

the

meantime.

   For a "linear" commit history, this may become a bottleneck

eventually

as well.

- There is less risk of broken master. If one

repository/modules

breaks

its master, the others can still continue.

Stephan


On Fri, Mar 10, 2017 at 12:20 PM, Till Rohrmann <

trohrm...@apache.org <mailto:trohrm...@apache.org>>

wrote:

Thanks for all your input. In order to wrap the discussion up

I'd

like

to

summarize the mentioned points:

The problem of increasing build times and complexity of the

project

has

been acknowledged. Ideally we would have everything in one

repository

using

an incremental build tool. Since Maven does not properly

support

this

we

would have to switch our build tool to something like Gradle,

for

example.

Another option is introducing build profiles for different

sets

of

modules

as well as separating integration and unit tests. The third

alternative

would be creating sub-projects with their own repositories. I

actually

think that these two proposal are not necessarily exclusive

and

it

would

also make sense to have a separation between unit and

integration

tests

if

we split the respository.

The overall consensus seems to be that we don't want to split

the

community

and want to keep everything under the same umbrella. I think

this

is

the

right way to go, because otherwise some parts of the project

could

become

second class citizens. Given that and that we continue using

Maven,

still

think that creating sub-projects for the libraries, for

example,

could

be

beneficial. A split could reduce the project's complexity and

make

it

potentially easier for libraries to get actively developed.

The

main

concern is setting up the build infrastructure to aggregate

docs

from

multiple repositories and making them publicly available.

Since I started this thread and I would really like to see

Flink's

ML

library being revived again, I'd volunteer investigating first

whether

it

is doable establishing a proper incremental build for Flink.

If

that

should

not be possible, I will look into splitting the repository,

first

only

for

the libraries. I'll share my results with the community once

I'm

done

with

the investigation.

Cheers,
Till

On Fri, Feb 24, 2017 at 3:50 PM, Robert Metzger <

rmetz...@apache.org <mailto:rmetz...@apache.org>>

wrote:

@Jin Mingjian: You can not use the paid travis version for

open

source

projects. It only works for private repositories (at least

back

then

when

we've asked them about that).

@Stephan: I don't think that incremental builds will be

available

with

Maven anytime soon.

I agree that we need to fix the build time issue on Travis.

I've

recently

pushed a commit to use now three instead of two test groups.
But I don't think that this is feasible long-term solution.

If this discussion is only about reducing the build and test

time,

introducing build profiles for different components as

Aljoscha

suggested

would solve the problem Till mentioned.
Also, if we decide that travis is not a good tool anymore for

the

testing,

I guess we can find a different solution. There are now

competitors

to

Travis that might be willing to offer a paid plan for an open

source

project, or we set up our own infra on a server sponsored by

one

of

the

contributing companies.
If we want to solve "community issues" with the change as

well,

then

think its work the effort of splitting up Flink into

different

repositories.

Splitting up repositories is not a trivial task in my

opinion.

As

others

have mentioned before, we need to consider the following

things:

- How are we doing to build the documentation? Ideally every

repo

should

contain its docs, so we would need to pull them together when

building

the

main docs.
- How do organize the dependencies? If we have library

repository

depend

on

snapshot Flink versions, we need to make sure that the

snapshot

deployment

always works. This also means that people working on a

library

repository

will pull from snapshot OR need to build first locally.
- We need to update the release scripts

If we commit to do these changes, we need to assign at least

one

committer

(yes, in this case we need somebody who can commit, for

example

for

updating the buildbot stuff) who volunteers to do the change.
I've done a lot of infrastructure work in the past, but I'm

currently

pretty booked with many other things, so I don't

realistically

see

myself

doing that. Max who used to work on these things is taking

some

time

off.

I think we need, best case 3 days for the change, worst case

days.

The

problem is that there are no "unit tests" for the infra

stuff,

so

many

things are "trial and error" (like Apache's buildbot, our

release

scripts,

the doc scripts, maven stuff, nightly builds).



On Thu, Feb 23, 2017 at 1:33 PM, Stephan Ewen <

se...@apache.org <mailto:se...@apache.org>>

wrote:

If we can get a incremental builds to work, that would

actually

be

the

preferred solution in my opinion.

Many companies have invested heavily in making a "single

repository"

code

base work, because it has the advantage of not having to

update/publish

several repositories first.
However, the strong prerequisite for that is an incremental

build

system

that builds only (fine grained) what it has to build. I am

not

sure

how

we

could make that work
with Maven and Travis...

On Wed, Feb 22, 2017 at 10:42 PM, Greg Hogan <

c...@greghogan.com <mailto:c...@greghogan.com>>

wrote:

An additional option for reducing time to build and test is

parallel

execution. This would help users more than on TravisCI

since

we're

generally running on multi-core machines rather than VM

slices.

Is the idea that each user would only check out the modules

that

he

or

she

is developing with? For example, if a developer is not

working

on

flink-mesos or flink-yarn then the "flink-deploy" module

would

not

be

clone

to their filesystem?

We can run a TravisCI nightly build on each repo to

validate

against

API

changes.

Greg

On Wed, Feb 22, 2017 at 12:24 PM, Fabian Hueske <

fhue...@gmail.com <mailto:fhue...@gmail.com>

wrote:

Hi everybody,

I think this should be a discussion about the benefits and

drawbacks

of

separating the code into distinct repositories from a

development

point

of

view.
So I agree with Stephan that we should not divide the

community

by

creating

separate groups of committers.
Also the discussion about independent releases is not be

strictly

related

to the decision, IMO.

I see a few pros and cons for splitting the code base into

separate

repositories which (I think) haven't been mentioned

before:

pros:
- IDE setup will be leaner. It is not necessary to compile

the

whole

code

base to run a test after switching a branch.
cons:
- developing libraries features that require changes in

the

core

APIs

become more time consuming due to back-and-forth between

code

bases.

However, I think this is not very often the case.

Aljoscha has good points as well. Many of the build issues

could

be

solved

by different build profiles and configurations.

Best, Fabian

2017-02-22 14:59 GMT+01:00 Gábor Hermann <

m...@gaborhermann.com <mailto:m...@gaborhermann.com>

@Stephan:

Although I tried to raise some issues about splitting

committers,

I'm

still strongly in favor of some kind of restructuring. We

just

have

to

be

conscious about the disadvantages.

Not splitting the committers could leave the libraries in

the

same

stalling status, described by Till. Of course, dedicating

current

committers as shepherds of the libraries could easily

resolve

the

issue.

But that requires time from current committers. It seems

like

trade-offs

between code quality, speed of development, and committer

efforts.

 From what I see in the discussion about ML, there are

many

people

willing

to contribute as well as production use-cases. This means

we

could

and

should move forward. However, the development speed is

significantly

slowed

down by stalling PRs. The proposal for contributors

helping

the

review

process did not really work out so far. In my opinion,

either

code

quality

(by more easily accepting new committers) or some

committer

time

(reviewing/merging) should be sacrificed to move forward.

As

Till

has

indicated, it would be shameful if we let this

contribution

effort

die.

Cheers,
Gabor

Re: [DISCUSS] Project build time and possible restructuring

Reply via email to