I like the idea of a leaner binary distribution. At the same time I
agree with Jamie that the current binary is quite convenient and
connection speeds should not be that big of a deal. Since the binary
distribution is one of the first entry points for users, I'd like to
keep it as user-friendly as possible.

What do you think about building a lean distribution by default and a
"full" distribution that still bundles all the optional dependencies
for releases? (If you don't think that's feasible I'm still +1 to only
go with the "lean dist" approach.)

– Ufuk

On Wed, Jan 23, 2019 at 9:36 AM Stephan Ewen <se...@apache.org> wrote:
>
> There are some points where a leaner approach could help.
> There are many libraries and connectors that are currently being adding to
> Flink, which makes the "include all" approach not completely feasible in
> long run:
>
>   - Connectors: For a proper experience with the Shell/CLI (for example for
> SQL) we need a lot of fat connector jars.
>     These come often for multiple versions, which alone accounts for 100s
> of MBs of connector jars.
>   - The pre-bundled FileSystems are also on the verge of adding 100s of MBs
> themselves.
>   - The metric reporters are bit by bit growing as well.
>
> The following could be a compromise:
>
> The flink-dist would include
>   - the core flink libraries (core, apis, runtime, etc.)
>   - yarn / mesos  etc. adapters
>   - examples (the examples should be a small set of self-contained programs
> without additional dependencies)
>   - default logging
>   - default metric reporter (jmx)
>   - shells (scala, sql)
>
> The flink-dist would NOT include the following libs (and these would be
> offered for individual download)
>   - Hadoop libs
>   - the pre-shaded file systems
>   - the pre-packaged SQL connectors
>   - additional metric reporters
>
>
> On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang <zjf...@gmail.com> wrote:
>
> > Thanks Chesnay for raising this discussion thread.  I think there are 3
> > major use scenarios for flink binary distribution.
> >
> > 1. Use it to set up standalone cluster
> > 2. Use it to experience features of flink, such as via scala-shell,
> > sql-client
> > 3. Downstream project use it to integrate with their system
> >
> > I did a size estimation of flink dist folder, lib folder take around 100M
> > and opt folder take around 200M. Overall I agree to make a thin flink dist.
> > So the next problem is which components to drop. I check the opt folder,
> > and I think the filesystem components and metrics components could be moved
> > out. Because they are pluggable components and is only used in scenario 1 I
> > think (setting up standalone cluster). Other components like flink-table,
> > flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
> > still use it to try the features of flink. For me, scala-shell is the first
> > option to try new features of flink.
> >
> >
> >
> > Fabian Hueske <fhue...@gmail.com> 于2019年1月18日周五 下午7:34写道:
> >
> >> Hi Chesnay,
> >>
> >> Thank you for the proposal.
> >> I think this is a good idea.
> >> We follow a similar approach already for Hadoop dependencies and
> >> connectors (although in application space).
> >>
> >> +1
> >>
> >> Fabian
> >>
> >> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
> >> ches...@apache.org>:
> >>
> >>> Hello,
> >>>
> >>> the binary distribution that we release by now contains quite a lot of
> >>> optional components, including various filesystems, metric reporters and
> >>> libraries. Most users will only use a fraction of these, and as such
> >>> pretty much only increase the size of flink-dist.
> >>>
> >>> With Flink growing more and more in scope I don't believe it to be
> >>> feasible to ship everything we have with every distribution, and instead
> >>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
> >>> lean and additional components are downloaded separately and added by
> >>> the user.
> >>>
> >>> This would primarily affect the /opt directory, but could also be
> >>> extended to cover flink-dist. For example, the yarn and mesos code could
> >>> be spliced out into separate jars that could be added to lib manually.
> >>>
> >>> Let me know what you think.
> >>>
> >>> Regards,
> >>>
> >>> Chesnay
> >>>
> >>>
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >

Reply via email to