That is a good point.
The problem is relevant even for our binary release as we package the same
set of libraries for the binary and python packages.
For now, I suggest, we include all the packages from bin.xml. For the next
release, we need to do systematic experiments to remove unnecessary
libraries from bin/python.

Regards,
Arnab..

On Thu, Jun 23, 2022 at 5:50 PM Baunsgaard, Sebastian
<baunsga...@tugraz.at.invalid> wrote:

> Hi,
>
> To verify this we have our current "no environment test" in python this
> verify what packages are needed vs not.
>
> Unfortunately right now our main branch fails because of missing packages.
> So i am making it bigger
> Ideally we would not need to pack any of the hadoop things into the python
> package.
> Currently the system require hadoop jars because we import hadoop packages
> many places in our code base where it could potentially be avoided.
>
> best regards
> Sebastian
>
> ________________________________
> From: Janardhan <janard...@apache.org>
> Sent: Thursday, June 23, 2022 5:14:24 PM
> To: dev@systemds.apache.org
> Subject: Re: [DISCUSS] PyPi packages are more than 100 MB.
>
> Hi team,
>
> In the list attached before, the following
>
> 19MB - hadoop-client-api-3.3.1.jar [1]
> 31MB - hadoop-client-runtime-3.3.1.jar [2]
>
> are added, which are introduced in the Hadoop 3.x.
>
> These jars are added to the bin packaging, with
> `<include>*:hadoop-client*</include>`[3] line the bin.xml. It has not
> changed recently.
>
> Are these libraries intentional and important for binary release. Is
> it possible to remove them?
>
>
> [1] https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-api
> [2]
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-runtime
> [3]
> https://github.com/apache/systemds/blame/main/src/assembly/bin.xml#L100
> [
> https://opengraph.githubassets.com/4346ffcbfafaa80de9f253ffae4064695cb243ef464a0b841cc5e00ee05f127b/apache/systemds
> ]<https://github.com/apache/systemds/blame/main/src/assembly/bin.xml#L100>
>
> systemds/src/assembly/bin.xml at main · apache/systemds · GitHub<
> https://github.com/apache/systemds/blame/main/src/assembly/bin.xml#L100>
> github.com
> An open source ML system for the end-to-end data science lifecycle -
> systemds/src/assembly/bin.xml at main · apache/systemds
>
>
>
> Thanks,
> Janardhan
>
>
>
> On Tue, Jun 21, 2022 at 11:40 PM arnab phani <phaniar...@gmail.com> wrote:
> >
> > I thought, we only include the libraries from SystemDS binary in the
> python
> > package. If so, then hadoop-* libraries are not new additions.
> > Unfortunately, test.pypi doesn't allow packages of more than 100MB, which
> > means we won't be able to dry run our python releases.
> > I would be a little more comfortable with a better explanation for why
> the
> > python package size increased by 2x from the last release.
> >
> > Regards,
> > Arnab..
> >
> > On Tue, Jun 21, 2022 at 6:55 PM Janardhan <janard...@apache.org> wrote:
> >
> > > Hi,
> > >
> > > PyPi packages are a little more than 100MB. Compared 2.2.1 which is
> ~56 MB.
> > >
> > > -- Added in the present release (library sizes after unzip)
> > >
> > >   70K Jun 21 15:08 commons-compiler-3.0.16.jar
> > >  601K Jun 21 15:08 commons-compress-1.19.jar
> > >
> > >  193K Jun 21 15:08 commons-text-1.6.jar
> > >
> > >   19M Jun 21 15:08 hadoop-client-api-3.3.1.jar
> > >   31M Jun 21 15:08 hadoop-client-runtime-3.3.1.jar
> > >
> > >  5.3M Jun 21 15:08 hadoop-hdfs-client-3.3.1.jar
> > >
> > >  1.5M Jun 21 15:08 htrace-core4-4.1.0-incubating.jar
> > >
> > >  126K Jun 21 15:08 re2j-1.1.jar
> > >
> > >  192K Jun 21 15:08 stax2-api-4.2.1.jar
> > >  511K Jun 21 15:08 woodstox-core-5.3.0.jar
> > >
> > > Let us see if there is some optimization we can do?
> > >
> > > Best,
> > > Janardhan
> > >
>

Reply via email to