Re: [DISCUSS] Project build time and possible restructuring

Theodore Vasiloudis Tue, 21 Feb 2017 08:45:09 -0800

Hello all,

>From a library developer POV I think splitting up the project will have
more advantages than disadvantages.
Api breaking things should move to be the responsibility of library
developers, and with automated tests
they shouldn't be too hard to catch.


I think I'm more fin favor of synced releases to not confuse users. If we
are going to be presenting the Flink stack
as an integrated product, as a user I would expect everything to be under
one release schedule and not
have to worry about different versions of different parts of the stack.

If we were to split how does that work under the ASF? Is it possible to
have someone be a committer for
a library but not for the core?

Regards,
Theodore


On Tue, Feb 21, 2017 at 1:44 PM, Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Flink community,
>
> I'd like to revive a discussion about Flink's build time and project
> structure which we already had in some other mailing thread [1] and which
> we wanted do after the 1.2 release.
>
> Recently, we can see that Flink is exceeding more and more often Travis
> maximum build time of 50 minutes. This leads to failing builds as it can be
> seen here [2]. Almost 50 % of my last builds on Travis failed because of
> the 50 minutes time limit.
>
> The excess of the time limit not only prevents some tests (especially the
> yarn tests) to be executed regularly but it also undermines the people's
> trust into CI. We've seen in the past that when we had some flakey tests
> that there was an acceptance to merge PRs even though Travis failed because
> the failing tests were "always" unrelated. But how sure can you be about
> that? Having a properly working and reliable CI system is imo crucial for
> guaranteeing Flink's high quality standard.
>
> In the past we've split Flink's tests into two groups which are executed
> separately in order to cope with increasing build times. This could again
> be a solution to the problem.
>
> However, there is also another problem of slowly increasing build times for
> Flink. On my machine building Flink with deactivated tests takes about 10
> minutes. That's mainly because Flink has grown quite big containing now not
> only the runtime and apis but also several libraries and a contribution
> module. Stephan proposed to split up the repository into the following set
>
>   - flink-core (core, apis, runtime, clients)
>   - flink-libraries (gelly, ml, cep, table, scala shell, python)
>   - flink-connectors
>   - flink-contrib
>
> in order to make the project better maintainable and decreasing build as
> well as test times. Of course such a split would raise the question how and
> how often the individual modules are released. Will they follow an
> independent release cycle or will they be synched? Moreover, the problem of
> API stability across module boundaries will arise. Changing things in the
> core repository might break things in a library repository and since they
> are independent this break might go unnoticed for some time. Stephan's
> proposal also includes that the new repositories will be governed by the
> same PMC.
>
> A little bit off-topic but also somewhat related is how we handle the load
> of outside contributions for modules where we don't have many committers
> present. Good examples (actually they are bad examples for community work)
> are the ML and the CEP library. These libraries started promising and
> attracted outside contributions. However, due to a lack of committers who
> could spend time on these libraries, their development stalled and made
> many contributors turn away from it. Maybe such a split makes things easier
> wrt to making more contributors committers. Moreover, an independent
> release cycle for volatile projects might help increasing adoption, because
> bug-fixes can be delivered more frequently.
>
> Recently, I've seen an increased interest in and really good discussions
> about FlinkML's future [3]. I really would not like to repeat the same
> mistakes and let this effort die again by simply being not responsive to
> contributors who would like to get involved. The only way I see this
> happening is to add more committers to the ML library. And maybe we feel
> more comfortable adding new committers faster to repos which are not Flink
> core.
>
> I know we should first discuss the former problem and find a conclusion
> there. But I mentioned the outside contributors problem as well because it
> is an argument for a repo split.
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.
> nabble.com/Travis-CI-tt14478.html
> [2] https://travis-ci.org/tillrohrmann/flink/builds/203479275
> [3]
> http://apache-flink-mailing-list-archive.1008284.n3.
> nabble.com/DISCUSS-Flink-ML-roadmap-tt16040.html
>
> Cheers,
> Till
>

Re: [DISCUSS] Project build time and possible restructuring

Reply via email to