I agress with Aljoscha that we might consider moving from Jenkins to Travis. Is there any disadvantage in using Jenkins?

I think we should structure the project according to release management (e.g. more frequent releases of libraries) or other criteria (e.g. core and non-core) instead of build time. What would happen if the built of another submodule would become too long, would we split/restructure again and again? If Jenkins solves all our problems we should use it.

Regards,
Timo



Am 20/03/17 um 12:21 schrieb Aljoscha Krettek:
I prefer Jenkins to Travis by far. Working on Beam, where we have good Jenkins 
integration, has opened my eyes to what is possible with good CI integration.

For example, look at this recent Beam PR: https://github.com/apache/beam/pull/2263 
<https://github.com/apache/beam/pull/2263>. The Jenkins-Github integration will 
tell you exactly which tests failed and if you click on the links you can look at the 
log output/std out of the tests in question.

This is the overview page of one of the Jenkins Jobs that we have in Beam: 
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/ 
<https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/>. This is an 
example of a stable build: 
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastStableBuild/ 
<https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastStableBuild/>.
 Notice how it gives you fine grained information about the Maven run. This is an unstable run: 
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastUnstableBuild/ 
<https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastUnstableBuild/>.
 There you can see which tests failed and you can easily drill down.

Best,
Aljoscha

On 20 Mar 2017, at 11:46, Robert Metzger <rmetz...@apache.org> wrote:

Thank you for looking into the build times.

I didn't know that the build time situation is so bad. Even with yarn, mesos, 
connectors and libraries removed, we are still running into the build timeout :(

Aljoscha told me that the Beam community is using Jenkins for running the 
tests, and they are planning to completely move away from Travis. I wonder 
whether we should do the same, as having our own Jenkins servers would allow us 
to run tests for more than 50 minutes.

I agree with Stephan that we should keep the yarn and mesos tests in the core 
for stability / testing quality purposes.


On Mon, Mar 20, 2017 at 11:27 AM, Stephan Ewen <se...@apache.org 
<mailto:se...@apache.org>> wrote:
@Greg

I am personally in favor of splitting "connectors" and "contrib" out as
well. I know that @rmetzger has some reservations about the connectors, but
we may be able to convince him.

For the cluster tests (yarn / mesos) - in the past there were many cases
where these tests caught cases that other tests did not, because they are
the only tests that actually use the "flink-dist.jar" and thus discover
many dependency and configuration issues. For that reason, my feeling would
be that they are valuable in the core repository.

I would actually suggest to do only the library split initially, to see
what the challenges are in setting up the multi-repo build and release
tooling. Once we gathered experience there, we can probably easily see what
else we can split out.

Stephan


On Fri, Mar 17, 2017 at 8:37 PM, Greg Hogan <c...@greghogan.com 
<mailto:c...@greghogan.com>> wrote:

I’d like to use this refactoring opportunity to unspilt the Travis tests.
With 51 builds queued up for the weekend (some of which may fail or have
been force pushed) we are at the limit of the number of contributions we
can process. Fixing this requires 1) splitting the project, 2)
investigating speedups for long-running tests, and 3) staying cognizant of
test performance when accepting new code.

I’d like to add one to Stephan’s list of module group. I like that the
modules are generic (“libraries”) so that no one module is alone and
independent.

Flink has three “libraries”: cep, ml, and gelly.

“connectors” is a hotspot due to the long-running Kafka tests (and
connectors for three Kafka versions).

Both flink-storm and flink-python have a modest number of number of tests
and could live with the miscellaneous modules in “contrib”.

The YARN tests are long-running and problematic (I am unable to
successfully run these locally). A “cluster” module could host flink-mesos,
flink-yarn, and flink-yarn-tests.

That gets us close to running all tests in a single Travis build.
   https://travis-ci.org/greghogan/flink/builds/212122590 
<https://travis-ci.org/greghogan/flink/builds/212122590> <
https://travis-ci.org/greghogan/flink/builds/212122590 
<https://travis-ci.org/greghogan/flink/builds/212122590>>

I also tested (https://github.com/greghogan/flink/commits/core_build 
<https://github.com/greghogan/flink/commits/core_build> <
https://github.com/greghogan/flink/commits/core_build 
<https://github.com/greghogan/flink/commits/core_build>>) with a maven
parallelism of 2 and 4, with the latter a 6.4% drop in build time.
   https://travis-ci.org/greghogan/flink/builds/212137659 
<https://travis-ci.org/greghogan/flink/builds/212137659> <
https://travis-ci.org/greghogan/flink/builds/212137659 
<https://travis-ci.org/greghogan/flink/builds/212137659>>
   https://travis-ci.org/greghogan/flink/builds/212154470 
<https://travis-ci.org/greghogan/flink/builds/212154470> <
https://travis-ci.org/greghogan/flink/builds/212154470 
<https://travis-ci.org/greghogan/flink/builds/212154470>>

We can run Travis CI builds nightly to guard against breaking changes.

I also wanted to get an idea of how disruptive it would be to developers
to divide the project into multiple git repos. I wrote a simple python
script and configured it with the module partitions listed above. The usage
string from the top of the file lists commits with files from multiple
partitions and well as the modified files.
   https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897 
<https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897> <
https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897 
<https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897>>

Accounting for the merging of the batch and streaming connector modules,
and assuming that the project structure has not changed much over the past
15 months, for the following date ranges the listed number of commits would
have been split across repositories.

since "2017-01-01"
36 of 571 commits were mixed

since "2016-07-01"
155 of 1607 commits were mixed

since "2016-01-01"
272 of 2561 commits were mixed

Greg


On Mar 15, 2017, at 1:13 PM, Stephan Ewen <se...@apache.org 
<mailto:se...@apache.org>> wrote:

@Robert - I think once we know that a separate git repo works well, and
that it actually solves problems, I see no reason to not create a
connectors repository later. The infrastructure changes should be
identical
for two or more repositories.

On Wed, Mar 15, 2017 at 5:22 PM, Till Rohrmann <trohrm...@apache.org 
<mailto:trohrm...@apache.org>>
wrote:
I think it should not be at least the flink-dist but exactly the
remaining
flink-dist module. Otherwise we do redundant work.

On Wed, Mar 15, 2017 at 5:03 PM, Robert Metzger <rmetz...@apache.org 
<mailto:rmetz...@apache.org>>
wrote:

"flink-core" means the main repository, not the "flink-core" module.

When doing a release, we need to build the flink main code first,
because
the flink-libraries depend on that.
Once the "flink-libraries" are build, we need to run the main build
again
(at least the flink-dist module), so that it is pulling the artifacts
from
the flink-libraries to put them into the opt/ folder of the final
artifact.



On Wed, Mar 15, 2017 at 4:44 PM, Till Rohrmann <trohrm...@apache.org 
<mailto:trohrm...@apache.org>>
wrote:

I'm ok with point 3.

Concerning point 8: Why do we have to build flink-core twice after
having
it built as a dependency for flink-libraries? This seems wrong to me.

Cheers,
Till

On Wed, Mar 15, 2017 at 4:23 PM, Robert Metzger <rmetz...@apache.org 
<mailto:rmetz...@apache.org>>
wrote:

Thank you. Running on AWS is a good idea!
Let me know if you (or anybody else) wants to help me with the
infrastructure work! Any help is much appreciated (as I've said
before, I
don't really have time for doing this, but it has to be done :) )

I'm against creating two new repositories. I fear that this
introduces
too
much complexity and too many repositories.
"flink" and "flink-libraries" are hopefully enough to get the build
time
significantly down.
We can also consider putting the connectors into the
"flink-libraries"
repo
if we need to further reduce the build time.

We should probably move "flink-table" of out "flink-libraries" if we
want
to keep "flink-table" in the main repo. (This would eliminate the
"flink-libraries" module from main.

Also, I agree that "flink-statebackend-rocksdb" is not correctly
placed
in
contrib anymore.


On Wed, Mar 15, 2017 at 4:07 PM, Greg Hogan <c...@greghogan.com 
<mailto:c...@greghogan.com>>
wrote:
Robert, appreciate your kickstarting this task.

We should compare the verification time with and without the listed
modules. I’ll try to run this by tomorrow on AWS and on Travis.

Should we maintain separate repos for flink-contrib and
flink-libraries?
Are you intending that we move flink-table out of flink-libraries
(and
perhaps flink-statebackend-rocksdb out of flink-contrib)?

Greg


On Mar 15, 2017, at 9:55 AM, Robert Metzger <rmetz...@apache.org 
<mailto:rmetz...@apache.org>
wrote:
Thank you for looking into this Till.

I think we then have to split the repositories.
My main motivation for doing this is that it seems to be the only
feasible
way of scaling the community to allow more committers working on
the
libraries.

I'll take care of getting things started.

As the next steps I propose to:
1. Ask INFRA to rename https://git-wip-us.apache.org/ 
<https://git-wip-us.apache.org/>
repos/asf?p=flink-
connectors.git;a=summary to "flink-libraries"
2. Ask INFRA to set up GitHub and travis integration for
"flink-libraries"
3. Put the code of "flink-ml", "flink-gelly", "flink-python",
"flink-cep",
"flink-scala-shell", "flink-storm" into the new repository. (I
decided
against moving flink-contrib there, because rocksdb is in the
contrib
module, for flink-table, I'm undecided, but I kept it in the main
repo
because its probably going to interact more with the core code in
the
future)
I try to preserve the history of those modules when splitting
them
into
the
new repo
4. I'll close all pull requests against those modules in the main
repo.
5. I'll set up a minimal documentation page for the library
repository,
similar to the main documentation.
6. I'll update the documentation build process to build both
documentations
& link them to each other
7. I'll update the nightly deployment process to include both
repositories
8. I'll update the release script to create the Flink release out
of
both
repositories. In order to put the libraries into the opt/ dir of
the
release, I'll need to change the build of "flink-dist" so that it
first
builds flink core, then the libraries and then the core again
with
the
libraries as an additional dependency.

The main question for the community is: do you agree with point
3 ?
Would
you like to include more or less?

I'll start with 1. and 2. tomorrow morning.



On Wed, Mar 15, 2017 at 1:48 PM, Till Rohrmann <
trohrm...@apache.org <mailto:trohrm...@apache.org>
wrote:
In theory we could have a merging bot which solves the problem
of
the
"commit window". Once the PR passes all tests and has enough
+1s,
the
bot
could do the merging and, thus, it effectively linearizes the
merge
process.

I think the second point is actually a disadvantage because
there
is
not
such an immediate incentive/pressure to fix the broken module if
it
lives
in a separate repository. Furthermore, breaking API changes in
the
core
will most likely go unnoticed for some time in other modules
which
are
not
developed so actively. In the worst case these things will only
be
noticed
when we try to make a release.

But I also agree that we are not Google and we don't have the
capacities to
maintain such a smooth a build process that we can keep all the
code
in
a
single repository.

I looked a bit into Gradle and as far as I can tell it offers
some
nice
features wrt incrementally building projects. This would be
beneficial
for
local development but it would not solve our build time problems
on
Travis.
Gradle intends to introduce a task result cache which allows to
reuse
results across builds. This could help when building on Travis,
however, it
is not yet fully implemented. Moreover, migrating from Maven to
Gradle
won't come for free (there's simply no free lunch out there) and
we
might
risk to introduce new bugs. Therefore, I would vote to split the
repository
in order to mitigate our current problems with Travis and the
build
time in
general. Whether to use a different build system or not can then
be
discussed as an orthogonal question.

Cheers,
Till

On Tue, Mar 14, 2017 at 8:05 PM, Stephan Ewen <se...@apache.org 
<mailto:se...@apache.org>
wrote:
Some other thoughts on how repository split would help. I am
not
sure
for
all of them, so please comment:

- There is less competition for a "commit window". It happens
a
lot
already that you run all tests and want to commit, but there
was
a
commit
in the meantime. You rebase, need to re-test, again commit in
the
meantime.
   For a "linear" commit history, this may become a bottleneck
eventually
as well.

- There is less risk of broken master. If one
repository/modules
breaks
its master, the others can still continue.

Stephan


On Fri, Mar 10, 2017 at 12:20 PM, Till Rohrmann <
trohrm...@apache.org <mailto:trohrm...@apache.org>>
wrote:

Thanks for all your input. In order to wrap the discussion up
I'd
like
to
summarize the mentioned points:

The problem of increasing build times and complexity of the
project
has
been acknowledged. Ideally we would have everything in one
repository
using
an incremental build tool. Since Maven does not properly
support
this
we
would have to switch our build tool to something like Gradle,
for
example.
Another option is introducing build profiles for different
sets
of
modules
as well as separating integration and unit tests. The third
alternative
would be creating sub-projects with their own repositories. I
actually
think that these two proposal are not necessarily exclusive
and
it
would
also make sense to have a separation between unit and
integration
tests
if
we split the respository.

The overall consensus seems to be that we don't want to split
the
community
and want to keep everything under the same umbrella. I think
this
is
the
right way to go, because otherwise some parts of the project
could
become
second class citizens. Given that and that we continue using
Maven,
I
still
think that creating sub-projects for the libraries, for
example,
could
be
beneficial. A split could reduce the project's complexity and
make
it
potentially easier for libraries to get actively developed.
The
main
concern is setting up the build infrastructure to aggregate
docs
from
multiple repositories and making them publicly available.

Since I started this thread and I would really like to see
Flink's
ML
library being revived again, I'd volunteer investigating first
whether
it
is doable establishing a proper incremental build for Flink.
If
that
should
not be possible, I will look into splitting the repository,
first
only
for
the libraries. I'll share my results with the community once
I'm
done
with
the investigation.

Cheers,
Till

On Fri, Feb 24, 2017 at 3:50 PM, Robert Metzger <
rmetz...@apache.org <mailto:rmetz...@apache.org>>
wrote:

@Jin Mingjian: You can not use the paid travis version for
open
source
projects. It only works for private repositories (at least
back
then
when
we've asked them about that).

@Stephan: I don't think that incremental builds will be
available
with
Maven anytime soon.

I agree that we need to fix the build time issue on Travis.
I've
recently
pushed a commit to use now three instead of two test groups.
But I don't think that this is feasible long-term solution.

If this discussion is only about reducing the build and test
time,
introducing build profiles for different components as
Aljoscha
suggested
would solve the problem Till mentioned.
Also, if we decide that travis is not a good tool anymore for
the
testing,
I guess we can find a different solution. There are now
competitors
to
Travis that might be willing to offer a paid plan for an open
source
project, or we set up our own infra on a server sponsored by
one
of
the
contributing companies.
If we want to solve "community issues" with the change as
well,
then
I
think its work the effort of splitting up Flink into
different
repositories.

Splitting up repositories is not a trivial task in my
opinion.
As
others
have mentioned before, we need to consider the following
things:
- How are we doing to build the documentation? Ideally every
repo
should
contain its docs, so we would need to pull them together when
building
the
main docs.
- How do organize the dependencies? If we have library
repository
depend
on
snapshot Flink versions, we need to make sure that the
snapshot
deployment
always works. This also means that people working on a
library
repository
will pull from snapshot OR need to build first locally.
- We need to update the release scripts

If we commit to do these changes, we need to assign at least
one
committer
(yes, in this case we need somebody who can commit, for
example
for
updating the buildbot stuff) who volunteers to do the change.
I've done a lot of infrastructure work in the past, but I'm
currently
pretty booked with many other things, so I don't
realistically
see
myself
doing that. Max who used to work on these things is taking
some
time
off.
I think we need, best case 3 days for the change, worst case
5
days.
The
problem is that there are no "unit tests" for the infra
stuff,
so
many
things are "trial and error" (like Apache's buildbot, our
release
scripts,
the doc scripts, maven stuff, nightly builds).



On Thu, Feb 23, 2017 at 1:33 PM, Stephan Ewen <
se...@apache.org <mailto:se...@apache.org>>
wrote:
If we can get a incremental builds to work, that would
actually
be
the
preferred solution in my opinion.

Many companies have invested heavily in making a "single
repository"
code
base work, because it has the advantage of not having to
update/publish
several repositories first.
However, the strong prerequisite for that is an incremental
build
system
that builds only (fine grained) what it has to build. I am
not
sure
how
we
could make that work
with Maven and Travis...

On Wed, Feb 22, 2017 at 10:42 PM, Greg Hogan <
c...@greghogan.com <mailto:c...@greghogan.com>>
wrote:
An additional option for reducing time to build and test is
parallel
execution. This would help users more than on TravisCI
since
we're
generally running on multi-core machines rather than VM
slices.
Is the idea that each user would only check out the modules
that
he
or
she
is developing with? For example, if a developer is not
working
on
flink-mesos or flink-yarn then the "flink-deploy" module
would
not
be
clone
to their filesystem?

We can run a TravisCI nightly build on each repo to
validate
against
API
changes.

Greg

On Wed, Feb 22, 2017 at 12:24 PM, Fabian Hueske <
fhue...@gmail.com <mailto:fhue...@gmail.com>
wrote:
Hi everybody,

I think this should be a discussion about the benefits and
drawbacks
of
separating the code into distinct repositories from a
development
point
of
view.
So I agree with Stephan that we should not divide the
community
by
creating
separate groups of committers.
Also the discussion about independent releases is not be
strictly
related
to the decision, IMO.

I see a few pros and cons for splitting the code base into
separate
repositories which (I think) haven't been mentioned
before:
pros:
- IDE setup will be leaner. It is not necessary to compile
the
whole
code
base to run a test after switching a branch.
cons:
- developing libraries features that require changes in
the
core
/
APIs
become more time consuming due to back-and-forth between
code
bases.
However, I think this is not very often the case.

Aljoscha has good points as well. Many of the build issues
could
be
solved
by different build profiles and configurations.

Best, Fabian

2017-02-22 14:59 GMT+01:00 Gábor Hermann <
m...@gaborhermann.com <mailto:m...@gaborhermann.com>
:
@Stephan:

Although I tried to raise some issues about splitting
committers,
I'm
still strongly in favor of some kind of restructuring. We
just
have
to
be
conscious about the disadvantages.

Not splitting the committers could leave the libraries in
the
same
stalling status, described by Till. Of course, dedicating
current
committers as shepherds of the libraries could easily
resolve
the
issue.
But that requires time from current committers. It seems
like
trade-offs
between code quality, speed of development, and committer
efforts.
 From what I see in the discussion about ML, there are
many
people
willing
to contribute as well as production use-cases. This means
we
could
and
should move forward. However, the development speed is
significantly
slowed
down by stalling PRs. The proposal for contributors
helping
the
review
process did not really work out so far. In my opinion,
either
code
quality
(by more easily accepting new committers) or some
committer
time
(reviewing/merging) should be sacrificed to move forward.
As
Till
has
indicated, it would be shameful if we let this
contribution
effort
die.
Cheers,
Gabor






Reply via email to