When we **pip install** a wheel package, it just unpacks the wheel package
and installs its dependencies[1]. There is no way to download things from
an external website during installation. It works differently from the
source package where we could download something in the setup.py. This is
explained in detail in [2]. So I'm afraid that splitting the package is the
only solution we have if we want to reduce the package size of pyflink.

[1] https://www.python.org/dev/peps/pep-0427/
[2] https://realpython.com/python-wheels/#advantages-of-python-wheels

Best,
Xingbo

Till Rohrmann <trohrm...@apache.org> 于2021年3月19日周五 下午6:32写道:

> I think that we should try to reduce the size of the packages by either
> splitting them or by having another means to retrieve the Java binaries.
>
> Cheers,
> Till
>
> On Fri, Mar 19, 2021 at 2:58 AM Xingbo Huang <hxbks...@gmail.com> wrote:
>
> > Hi Till,
> >
> > The package size of tensorflow[1] is also very big(about 300MB+).
> However,
> > it does not try to solve the problem, but expands the space limit in PyPI
> > frequently whenever the project space is full. We could also choose this
> > option. According to our current release frequency, we probably need to
> > apply for 15GB expansion every year. There are not too many similar
> cases,
> > so there is also no standard solution to refer to. But the behavior of
> > splitting a project into multiple packages is quite common. For example,
> > apache airflow will prepare a corresponding release package for each
> > provider[2].
> >
> > So I think there are currently two solutions in my mind which could work.
> >
> > 1. Just keep the current solution and expand the space limit in PyPI
> > whenever the space is full.
> >
> > 2. Split into two packages to reduce the wheel package size.
> >
> > [1] https://pypi.org/project/tensorflow/#files
> > [2] https://pypi.org/search/?q=apache-airflow-*&o=
> >
> > Best,
> > Xingbo
> >
> > Till Rohrmann <trohrm...@apache.org> 于2021年3月17日周三 下午9:22写道:
> >
> > > How do other projects solve this problem?
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Mar 17, 2021 at 3:45 AM Xingbo Huang <hxbks...@gmail.com>
> wrote:
> > >
> > > > Hi Chesnay,
> > > >
> > > > Yes, in most cases, we can indeed download the required jars in
> > > `setup.py`,
> > > > which is also the solution I originally thought of reducing the size
> of
> > > > wheel packages. However, I'm afraid that it will not work in
> scenarios
> > > when
> > > > accessing the external network is not possible which is very common
> in
> > > the
> > > > production cluster.
> > > >
> > > > Best,
> > > > Xingbo
> > > >
> > > > Chesnay Schepler <ches...@apache.org> 于2021年3月16日周二 下午8:32写道:
> > > >
> > > > > This proposed apache-flink-libraries package would just contain the
> > > > > binary, right? And effectively be unusable to the python audience
> on
> > > > > it's own.
> > > > >
> > > > > Essentially we are just abusing Pypi for shipping a java binary. Is
> > > > > there no way for us to download the jars when the python package is
> > > > > being installed? (e.g., in setup.py)
> > > > >
> > > > > On 3/16/2021 1:23 PM, Dian Fu wrote:
> > > > > > Yes, the size of .whl file in PyFlink will also be about 3MB if
> we
> > > > split
> > > > > the package. Currently the package is big because we bundled the
> jar
> > > > files
> > > > > in it.
> > > > > >
> > > > > >> 2021年3月16日 下午8:13,Chesnay Schepler <ches...@apache.org> 写道:
> > > > > >>
> > > > > >> key difference being that the beam .whl files are 3mb large, aka
> > 60x
> > > > > smaller.
> > > > > >>
> > > > > >> On 3/16/2021 1:06 PM, Dian Fu wrote:
> > > > > >>> Hi Chesnay,
> > > > > >>>
> > > > > >>> We will publish binary packages separately for:
> > > > > >>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
> > > > > >>> 2) Linux / Mac separately
> > > > > >>>
> > > > > >>> Besides, there is also a source package which is used when none
> > of
> > > > the
> > > > > above binary packages is usable, e.g. for Window users.
> > > > > >>>
> > > > > >>> PS: publishing multiple binary packages is very common in
> Python
> > > > > world, e.g. Beam published 22 packages in 2.28, Pandas published 16
> > > > > packages in 1.2.3 [2]. We could also publishing more packages if we
> > > > > splitting the packages as the cost of adding another package will
> be
> > > very
> > > > > small.
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Dian
> > > > > >>>
> > > > > >>> [1] https://pypi.org/project/apache-beam/#files <
> > > > > https://pypi.org/project/apache-beam/#files> <
> > > > > https://pypi.org/project/apache-beam/#files <
> > > > > https://pypi.org/project/apache-beam/#files>>
> > > > > >>> [2] https://pypi.org/project/pandas/#files
> > > > > >>>
> > > > > >>>
> > > > > >>> Hi Xintong,
> > > > > >>>
> > > > > >>> Yes, you are right that there is 9 packages in 1.12 as we added
> > > > Python
> > > > > 3.8 support in 1.12.
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Dian
> > > > > >>>
> > > > > >>>> 2021年3月16日 下午7:45,Xintong Song <tonysong...@gmail.com> 写道:
> > > > > >>>>
> > > > > >>>> And it's not only uploaded to PyPI, but the ASF mirrors as
> well.
> > > > > >>>>
> > > > > >>>>
> > > >
> https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
> > > > > >>>>
> > > > > >>>> Thank you~
> > > > > >>>>
> > > > > >>>> Xintong Song
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <
> > > tonysong...@gmail.com
> > > > >
> > > > > wrote:
> > > > > >>>>
> > > > > >>>>> Actually, I think it's 9 packages, not 7.
> > > > > >>>>>
> > > > > >>>>> Check here for the 1.12.2 packages.
> > > > > >>>>> https://pypi.org/project/apache-flink/#files
> > > > > >>>>>
> > > > > >>>>> Thank you~
> > > > > >>>>>
> > > > > >>>>> Xintong Song
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <
> > > > ches...@apache.org
> > > > > >
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>>> Am I reading this correctly that we publish 7 different
> > > artifacts
> > > > > just
> > > > > >>>>>> for python?
> > > > > >>>>>> What does the release matrix look like?
> > > > > >>>>>>
> > > > > >>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
> > > > > >>>>>>> Hi Xingbo,
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> Thanks a lot for bringing up this discussion. Actually the
> > size
> > > > > limit
> > > > > >>>>>> already becomes an issue during releasing 1.11.3 and 1.12.1.
> > It
> > > > > blocks us
> > > > > >>>>>> to publish PyFlink packages to PyPI during the release as
> > there
> > > is
> > > > > no
> > > > > >>>>>> enough space left (PS: already published the packages after
> > > > > increasing the
> > > > > >>>>>> size limit).
> > > > > >>>>>>> Considering that the total package size are about 1.5GB
> > (220MB
> > > *
> > > > > 7) for
> > > > > >>>>>> each release, it makes sense to split the PyFlink package.
> It
> > > > could
> > > > > reduce
> > > > > >>>>>> the total package size to about 250MB (3MB * 7 + 220 MB) for
> > > each
> > > > > release.
> > > > > >>>>>> We don’t need to increase the size limit any more in the
> next
> > > few
> > > > > years as
> > > > > >>>>>> currently we still have about 7.5 GB space left.
> > > > > >>>>>>> So +1 from my side.
> > > > > >>>>>>>
> > > > > >>>>>>> Regards,
> > > > > >>>>>>> Dian
> > > > > >>>>>>>
> > > > > >>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hxbks...@gmail.com> 写道:
> > > > > >>>>>>>>
> > > > > >>>>>>>> Hi everyone,
> > > > > >>>>>>>>
> > > > > >>>>>>>> Since release-1.11, pyflink has introduced cython support
> > and
> > > we
> > > > > will
> > > > > >>>>>>>> release 7 packages (for different platforms and Python
> > > versions)
> > > > > to
> > > > > >>>>>> PyPI
> > > > > >>>>>>>> for each release and the size of each package is more than
> > > 200MB
> > > > > as we
> > > > > >>>>>> need
> > > > > >>>>>>>> to bundle the jar files into the package. The entire
> project
> > > > > space in
> > > > > >>>>>> PyPI
> > > > > >>>>>>>> grows very fast, and we need to apply to PyPI for more
> > project
> > > > > space
> > > > > >>>>>>>> frequently. Please refer to [
> > > > > >>>>>> https://github.com/pypa/pypi-support/issues/831]
> > > > > >>>>>>>> for more details.
> > > > > >>>>>>>>
> > > > > >>>>>>>> The root cause to this problem is that we bundled the jar
> > > files
> > > > > in each
> > > > > >>>>>>>> package. This is actually unnecessary if we could extract
> > the
> > > > jar
> > > > > files
> > > > > >>>>>>>> into a separate package which is dedicated to hold the jar
> > > > files.
> > > > > >>>>>>>>
> > > > > >>>>>>>> I’d like to propose to split the pyflink package into two
> > > > > packages: the
> > > > > >>>>>>>> original apache-flink  and apache-flink-libraries (Any
> > > > > suggestions for
> > > > > >>>>>> the
> > > > > >>>>>>>> name?). The package apache-flink-libraries only contains
> jar
> > > > > files and
> > > > > >>>>>>>> there is only one apache-flink-libraries package for each
> > > > > release. The
> > > > > >>>>>>>> package apache-flink depends on apache-flink-libraries and
> > for
> > > > > users,
> > > > > >>>>>> they
> > > > > >>>>>>>> still only need to install apache-flink and there is
> nothing
> > > > > different
> > > > > >>>>>> from
> > > > > >>>>>>>> before. We still need to release multiple wheel packages
> of
> > > > > >>>>>> apache-flink.
> > > > > >>>>>>>> However, the size will be very small as it doesn't contain
> > the
> > > > jar
> > > > > >>>>>> files
> > > > > >>>>>>>> any more.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Looking forward to your feedback.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Best,
> > > > > >>>>>>>>
> > > > > >>>>>>>> Xingbo
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to