Hi,

To verify this we have our current "no environment test" in python this verify 
what packages are needed vs not.

Unfortunately right now our main branch fails because of missing packages. So i 
am making it bigger
Ideally we would not need to pack any of the hadoop things into the python 
package.
Currently the system require hadoop jars because we import hadoop packages many 
places in our code base where it could potentially be avoided.

best regards
Sebastian

________________________________
From: Janardhan <janard...@apache.org>
Sent: Thursday, June 23, 2022 5:14:24 PM
To: dev@systemds.apache.org
Subject: Re: [DISCUSS] PyPi packages are more than 100 MB.

Hi team,

In the list attached before, the following

19MB - hadoop-client-api-3.3.1.jar [1]
31MB - hadoop-client-runtime-3.3.1.jar [2]

are added, which are introduced in the Hadoop 3.x.

These jars are added to the bin packaging, with
`<include>*:hadoop-client*</include>`[3] line the bin.xml. It has not
changed recently.

Are these libraries intentional and important for binary release. Is
it possible to remove them?


[1] https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-api
[2] https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-runtime
[3] https://github.com/apache/systemds/blame/main/src/assembly/bin.xml#L100
[https://opengraph.githubassets.com/4346ffcbfafaa80de9f253ffae4064695cb243ef464a0b841cc5e00ee05f127b/apache/systemds]<https://github.com/apache/systemds/blame/main/src/assembly/bin.xml#L100>

systemds/src/assembly/bin.xml at main · apache/systemds · 
GitHub<https://github.com/apache/systemds/blame/main/src/assembly/bin.xml#L100>
github.com
An open source ML system for the end-to-end data science lifecycle - 
systemds/src/assembly/bin.xml at main · apache/systemds



Thanks,
Janardhan



On Tue, Jun 21, 2022 at 11:40 PM arnab phani <phaniar...@gmail.com> wrote:
>
> I thought, we only include the libraries from SystemDS binary in the python
> package. If so, then hadoop-* libraries are not new additions.
> Unfortunately, test.pypi doesn't allow packages of more than 100MB, which
> means we won't be able to dry run our python releases.
> I would be a little more comfortable with a better explanation for why the
> python package size increased by 2x from the last release.
>
> Regards,
> Arnab..
>
> On Tue, Jun 21, 2022 at 6:55 PM Janardhan <janard...@apache.org> wrote:
>
> > Hi,
> >
> > PyPi packages are a little more than 100MB. Compared 2.2.1 which is ~56 MB.
> >
> > -- Added in the present release (library sizes after unzip)
> >
> >   70K Jun 21 15:08 commons-compiler-3.0.16.jar
> >  601K Jun 21 15:08 commons-compress-1.19.jar
> >
> >  193K Jun 21 15:08 commons-text-1.6.jar
> >
> >   19M Jun 21 15:08 hadoop-client-api-3.3.1.jar
> >   31M Jun 21 15:08 hadoop-client-runtime-3.3.1.jar
> >
> >  5.3M Jun 21 15:08 hadoop-hdfs-client-3.3.1.jar
> >
> >  1.5M Jun 21 15:08 htrace-core4-4.1.0-incubating.jar
> >
> >  126K Jun 21 15:08 re2j-1.1.jar
> >
> >  192K Jun 21 15:08 stax2-api-4.2.1.jar
> >  511K Jun 21 15:08 woodstox-core-5.3.0.jar
> >
> > Let us see if there is some optimization we can do?
> >
> > Best,
> > Janardhan
> >

Reply via email to