[ https://issues.apache.org/jira/browse/ANY23-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942306#comment-16942306 ]
Lewis John McGibbney commented on ANY23-447: -------------------------------------------- Yes I am keen to reduce bloat particularly in the transitive dependencies pulled through tika-parsers. If you could provide a patch against master branch [~davidcockbill] it would be appreciated. > Reduce Any23 dependency bloat > ----------------------------- > > Key: ANY23-447 > URL: https://issues.apache.org/jira/browse/ANY23-447 > Project: Apache Any23 > Issue Type: Improvement > Components: core > Affects Versions: 2.3 > Reporter: David Cockbill > Priority: Minor > > Compelled by email conversation with Hans Brende: > {code:java} > David, unfortunately this move won't reduce the number of core dependencies > we have: the plugins and service modules are not dependencies of the core > module. However, it might be useful if you posted an issue about the > dependency bloat, including the various exclusions you are using: we might > be able to mitigate the problem. > {code} > This was a result of having to exclude dependencies in the pom.xml for a > product (Note that there was not too much thought in the exclusions, I was > trying to get the code size down before a release). Section of pom.xml: > {code:java} > <dependency> > <groupId>org.apache.any23</groupId> > <artifactId>apache-any23-core</artifactId> > <exclusions> > <!-- Any23 brings in a lot of dependencies which bloats the sharded > jar. > This is an attempt to reduce this by excluding packages > that we may not be using as part of Any23. > NOTE: If dependency is required at runtime, then a > java.lang.NoClassDefFoundError is thrown. --> > > <exclusion> > <groupId>org.apache.tika</groupId> > <artifactId>tika-parsers</artifactId> > </exclusion> > <exclusion> > <groupId>org.bouncycastle</groupId> > <artifactId>bcmail-jdk15on</artifactId> > </exclusion> > <exclusion> > <groupId>org.bouncycastle</groupId> > <artifactId>bcprov-jdk15on</artifactId> > </exclusion> > <exclusion> > <groupId>edu.ucar</groupId> > <artifactId>cdm</artifactId> > </exclusion> > <exclusion> > <groupId>net.sf.trove4j</groupId> > <artifactId>trove4j</artifactId> > </exclusion> > <exclusion> > <groupId>org.apache.cxf</groupId> > <artifactId>cxf-rt-rs-client</artifactId> > </exclusion> > <exclusion> > <groupId>com.github.ben-manes.caffeine</groupId> > <artifactId>caffeine</artifactId> > </exclusion> > <exclusion> > <groupId>org.opengis</groupId> > <artifactId>geoapi</artifactId> > </exclusion> > <exclusion> > <groupId>com.drewnoakes</groupId> > <artifactId>metadata-extractor</artifactId> > </exclusion> > <exclusion> > <groupId>org.eclipse.rdf4j</groupId> > <artifactId>rdf4j-repository-sail</artifactId> > </exclusion> > <exclusion> > <groupId>org.eclipse.rdf4j</groupId> > <artifactId>rdf4j-sail-memory</artifactId> > </exclusion> > <exclusion> > <groupId>org.tukaani</groupId> > <artifactId>xz</artifactId> > </exclusion> > <exclusion> > <groupId>org.codelibs</groupId> > <artifactId>jhighlight</artifactId> > </exclusion> > <exclusion> > <groupId>org.gagravarr</groupId> > <artifactId>vorbis-java-core</artifactId> > </exclusion> > <exclusion> > <groupId>org.gagravarr</groupId> > <artifactId>vorbis-java-tika</artifactId> > </exclusion> > <exclusion> > <groupId>org.apache.opennlp</groupId> > <artifactId>opennlp-tools</artifactId> > </exclusion> > <exclusion> > <groupId>org.apache.pdfbox</groupId> > <artifactId>pdfbox</artifactId> > </exclusion> > <exclusion> > <groupId>org.apache.pdfbox</groupId> > <artifactId>pdfbox-tools</artifactId> > </exclusion> > <exclusion> > <groupId>org.apache.poi</groupId> > <artifactId>poi-scratchpad</artifactId> > </exclusion> > <exclusion> > <groupId>edu.ucar</groupId> > <artifactId>grib</artifactId> > </exclusion> > <exclusion> > <groupId>com.googlecode.mp4parser</groupId> > <artifactId>isoparser</artifactId> > </exclusion> > <exclusion> > <groupId>com.healthmarketscience.jackcess</groupId> > <artifactId>jackcess</artifactId> > </exclusion> > <exclusion> > <groupId>com.healthmarketscience.jackcess</groupId> > <artifactId>jackcess-encrypt</artifactId> > </exclusion> > <exclusion> > <groupId>org.apache.sis.core</groupId> > <artifactId>sis-utility</artifactId> > </exclusion> > <exclusion> > <groupId>org.apache.sis.storage</groupId> > <artifactId>sis-netcdf</artifactId> > </exclusion> > <exclusion> > <groupId>org.apache.sis.core</groupId> > <artifactId>sis-metadata</artifactId> > </exclusion> > <exclusion> > <groupId>org.eclipse.rdf4j</groupId> > <artifactId>rdf4j-rio-trix</artifactId> > </exclusion> > <exclusion> > <groupId>org.yaml</groupId> > <artifactId>snakeyaml</artifactId> > </exclusion> > <exclusion> > <groupId>org.eclipse.rdf4j</groupId> > <artifactId>rdf4j-rio-turtle</artifactId> > </exclusion> > </exclusions> > </dependency> > {code} > Some background that may be useful from my notes: > {code:java} > Whilst adding Any23 the product, the Any23 Core package was causing Lintian > to fail. > Lintian is a Debian package checker written in PERL. This package uses > Archive::Zip to unpack any .jar file in the Debian package. This particular > unzip utility does not handle the Zip64 format; causing the failure. The > original zip format has various restrictions, one of which being the number > of files in the archive. Therefore if the class files in the jar for the > product exceeds this limit (65535), then a zip64 format file is produced > instead of a standard zip file. > The Any23 Core Library does seem quite excessive in what it pulls in. From > running the following, the output for the product goes from 40490 to 78513. > zipinfo -1 product.jar | wc -l > {code} > This Linitan failure on a linux build was the original push for the > exclusions; however the product .jar also increased in a similar fashion. -- This message was sent by Atlassian Jira (v8.3.4#803005)