Hi team, In the list attached before, the following
19MB - hadoop-client-api-3.3.1.jar [1] 31MB - hadoop-client-runtime-3.3.1.jar [2] are added, which are introduced in the Hadoop 3.x. These jars are added to the bin packaging, with `<include>*:hadoop-client*</include>`[3] line the bin.xml. It has not changed recently. Are these libraries intentional and important for binary release. Is it possible to remove them? [1] https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-api [2] https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-runtime [3] https://github.com/apache/systemds/blame/main/src/assembly/bin.xml#L100 Thanks, Janardhan On Tue, Jun 21, 2022 at 11:40 PM arnab phani <phaniar...@gmail.com> wrote: > > I thought, we only include the libraries from SystemDS binary in the python > package. If so, then hadoop-* libraries are not new additions. > Unfortunately, test.pypi doesn't allow packages of more than 100MB, which > means we won't be able to dry run our python releases. > I would be a little more comfortable with a better explanation for why the > python package size increased by 2x from the last release. > > Regards, > Arnab.. > > On Tue, Jun 21, 2022 at 6:55 PM Janardhan <janard...@apache.org> wrote: > > > Hi, > > > > PyPi packages are a little more than 100MB. Compared 2.2.1 which is ~56 MB. > > > > -- Added in the present release (library sizes after unzip) > > > > 70K Jun 21 15:08 commons-compiler-3.0.16.jar > > 601K Jun 21 15:08 commons-compress-1.19.jar > > > > 193K Jun 21 15:08 commons-text-1.6.jar > > > > 19M Jun 21 15:08 hadoop-client-api-3.3.1.jar > > 31M Jun 21 15:08 hadoop-client-runtime-3.3.1.jar > > > > 5.3M Jun 21 15:08 hadoop-hdfs-client-3.3.1.jar > > > > 1.5M Jun 21 15:08 htrace-core4-4.1.0-incubating.jar > > > > 126K Jun 21 15:08 re2j-1.1.jar > > > > 192K Jun 21 15:08 stax2-api-4.2.1.jar > > 511K Jun 21 15:08 woodstox-core-5.3.0.jar > > > > Let us see if there is some optimization we can do? > > > > Best, > > Janardhan > >