There are some points where a leaner approach could help.
There are many libraries and connectors that are currently being adding to
Flink, which makes the "include all" approach not completely feasible in
long run:

  - Connectors: For a proper experience with the Shell/CLI (for example for
SQL) we need a lot of fat connector jars.
    These come often for multiple versions, which alone accounts for 100s
of MBs of connector jars.
  - The pre-bundled FileSystems are also on the verge of adding 100s of MBs
themselves.
  - The metric reporters are bit by bit growing as well.

The following could be a compromise:

The flink-dist would include
  - the core flink libraries (core, apis, runtime, etc.)
  - yarn / mesos  etc. adapters
  - examples (the examples should be a small set of self-contained programs
without additional dependencies)
  - default logging
  - default metric reporter (jmx)
  - shells (scala, sql)

The flink-dist would NOT include the following libs (and these would be
offered for individual download)
  - Hadoop libs
  - the pre-shaded file systems
  - the pre-packaged SQL connectors
  - additional metric reporters


On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang <zjf...@gmail.com> wrote:

> Thanks Chesnay for raising this discussion thread.  I think there are 3
> major use scenarios for flink binary distribution.
>
> 1. Use it to set up standalone cluster
> 2. Use it to experience features of flink, such as via scala-shell,
> sql-client
> 3. Downstream project use it to integrate with their system
>
> I did a size estimation of flink dist folder, lib folder take around 100M
> and opt folder take around 200M. Overall I agree to make a thin flink dist.
> So the next problem is which components to drop. I check the opt folder,
> and I think the filesystem components and metrics components could be moved
> out. Because they are pluggable components and is only used in scenario 1 I
> think (setting up standalone cluster). Other components like flink-table,
> flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
> still use it to try the features of flink. For me, scala-shell is the first
> option to try new features of flink.
>
>
>
> Fabian Hueske <fhue...@gmail.com> 于2019年1月18日周五 下午7:34写道:
>
>> Hi Chesnay,
>>
>> Thank you for the proposal.
>> I think this is a good idea.
>> We follow a similar approach already for Hadoop dependencies and
>> connectors (although in application space).
>>
>> +1
>>
>> Fabian
>>
>> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
>> ches...@apache.org>:
>>
>>> Hello,
>>>
>>> the binary distribution that we release by now contains quite a lot of
>>> optional components, including various filesystems, metric reporters and
>>> libraries. Most users will only use a fraction of these, and as such
>>> pretty much only increase the size of flink-dist.
>>>
>>> With Flink growing more and more in scope I don't believe it to be
>>> feasible to ship everything we have with every distribution, and instead
>>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
>>> lean and additional components are downloaded separately and added by
>>> the user.
>>>
>>> This would primarily affect the /opt directory, but could also be
>>> extended to cover flink-dist. For example, the yarn and mesos code could
>>> be spliced out into separate jars that could be added to lib manually.
>>>
>>> Let me know what you think.
>>>
>>> Regards,
>>>
>>> Chesnay
>>>
>>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Reply via email to