+1 for Stephan's suggestion. For example, SQL connectors have never been part of the main distribution and nobody complained about this so far. I think what is more important than a big dist bundle is a helpful "Downloads" page where users can easily find available filesystems, connectors, metric repoters. Not everyone checks Maven central for available JAR files. I just saw that we added a "Optional components" section recently [1], we just need to make it more prominent. This is also done for the SQL connectors and formats [2].

[1] https://flink.apache.org/downloads.html
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/connect.html#dependencies

Regards,
Timo

Am 23.01.19 um 10:07 schrieb Ufuk Celebi:
I like the idea of a leaner binary distribution. At the same time I
agree with Jamie that the current binary is quite convenient and
connection speeds should not be that big of a deal. Since the binary
distribution is one of the first entry points for users, I'd like to
keep it as user-friendly as possible.

What do you think about building a lean distribution by default and a
"full" distribution that still bundles all the optional dependencies
for releases? (If you don't think that's feasible I'm still +1 to only
go with the "lean dist" approach.)

– Ufuk

On Wed, Jan 23, 2019 at 9:36 AM Stephan Ewen <se...@apache.org> wrote:
There are some points where a leaner approach could help.
There are many libraries and connectors that are currently being adding to
Flink, which makes the "include all" approach not completely feasible in
long run:

   - Connectors: For a proper experience with the Shell/CLI (for example for
SQL) we need a lot of fat connector jars.
     These come often for multiple versions, which alone accounts for 100s
of MBs of connector jars.
   - The pre-bundled FileSystems are also on the verge of adding 100s of MBs
themselves.
   - The metric reporters are bit by bit growing as well.

The following could be a compromise:

The flink-dist would include
   - the core flink libraries (core, apis, runtime, etc.)
   - yarn / mesos  etc. adapters
   - examples (the examples should be a small set of self-contained programs
without additional dependencies)
   - default logging
   - default metric reporter (jmx)
   - shells (scala, sql)

The flink-dist would NOT include the following libs (and these would be
offered for individual download)
   - Hadoop libs
   - the pre-shaded file systems
   - the pre-packaged SQL connectors
   - additional metric reporters


On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang <zjf...@gmail.com> wrote:

Thanks Chesnay for raising this discussion thread.  I think there are 3
major use scenarios for flink binary distribution.

1. Use it to set up standalone cluster
2. Use it to experience features of flink, such as via scala-shell,
sql-client
3. Downstream project use it to integrate with their system

I did a size estimation of flink dist folder, lib folder take around 100M
and opt folder take around 200M. Overall I agree to make a thin flink dist.
So the next problem is which components to drop. I check the opt folder,
and I think the filesystem components and metrics components could be moved
out. Because they are pluggable components and is only used in scenario 1 I
think (setting up standalone cluster). Other components like flink-table,
flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
still use it to try the features of flink. For me, scala-shell is the first
option to try new features of flink.



Fabian Hueske <fhue...@gmail.com> 于2019年1月18日周五 下午7:34写道:

Hi Chesnay,

Thank you for the proposal.
I think this is a good idea.
We follow a similar approach already for Hadoop dependencies and
connectors (although in application space).

+1

Fabian

Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
ches...@apache.org>:

Hello,

the binary distribution that we release by now contains quite a lot of
optional components, including various filesystems, metric reporters and
libraries. Most users will only use a fraction of these, and as such
pretty much only increase the size of flink-dist.

With Flink growing more and more in scope I don't believe it to be
feasible to ship everything we have with every distribution, and instead
suggest more of a "pick-what-you-need" model, where flink-dist is rather
lean and additional components are downloaded separately and added by
the user.

This would primarily affect the /opt directory, but could also be
extended to cover flink-dist. For example, the yarn and mesos code could
be spliced out into separate jars that could be added to lib manually.

Let me know what you think.

Regards,

Chesnay


--
Best Regards

Jeff Zhang


Reply via email to