[ 
https://issues.apache.org/jira/browse/ANY23-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942306#comment-16942306
 ] 

Lewis John McGibbney commented on ANY23-447:
--------------------------------------------

Yes I am keen to reduce bloat particularly in the transitive dependencies 
pulled through tika-parsers. If you could provide a patch against master branch 
[~davidcockbill] it would be appreciated.

> Reduce Any23 dependency bloat
> -----------------------------
>
>                 Key: ANY23-447
>                 URL: https://issues.apache.org/jira/browse/ANY23-447
>             Project: Apache Any23
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 2.3
>            Reporter: David Cockbill
>            Priority: Minor
>
> Compelled by email conversation with Hans Brende:
> {code:java}
> David, unfortunately this move won't reduce the number of core dependencies
> we have: the plugins and service modules are not dependencies of the core
> module. However, it might be useful if you posted an issue about the
> dependency bloat, including the various exclusions you are using: we might
> be able to mitigate the problem.
> {code}
> This was a result of having to exclude dependencies in the pom.xml for a 
> product (Note that there was not too much thought in the exclusions, I was 
> trying to get the code size down before a release). Section of pom.xml:
> {code:java}
>     <dependency>
>       <groupId>org.apache.any23</groupId>
>       <artifactId>apache-any23-core</artifactId>
>         <exclusions>
>           <!-- Any23 brings in a lot of dependencies which bloats the sharded 
> jar. 
>                This is an attempt to reduce this by excluding packages
>                that we may not be using as part of Any23.
>                NOTE: If dependency is required at runtime, then a 
>                java.lang.NoClassDefFoundError is thrown.  -->
>           
>           <exclusion>
>             <groupId>org.apache.tika</groupId>
>             <artifactId>tika-parsers</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.bouncycastle</groupId>
>             <artifactId>bcmail-jdk15on</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.bouncycastle</groupId>
>             <artifactId>bcprov-jdk15on</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>edu.ucar</groupId>
>             <artifactId>cdm</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>net.sf.trove4j</groupId>
>             <artifactId>trove4j</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.cxf</groupId>
>             <artifactId>cxf-rt-rs-client</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>com.github.ben-manes.caffeine</groupId>
>             <artifactId>caffeine</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.opengis</groupId>
>             <artifactId>geoapi</artifactId>
>           </exclusion>  
>           <exclusion>
>             <groupId>com.drewnoakes</groupId>
>             <artifactId>metadata-extractor</artifactId>
>           </exclusion> 
>           <exclusion>
>             <groupId>org.eclipse.rdf4j</groupId>
>             <artifactId>rdf4j-repository-sail</artifactId>
>           </exclusion> 
>           <exclusion>
>             <groupId>org.eclipse.rdf4j</groupId>
>             <artifactId>rdf4j-sail-memory</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.tukaani</groupId>
>             <artifactId>xz</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.codelibs</groupId>
>             <artifactId>jhighlight</artifactId>
>           </exclusion> 
>           <exclusion>
>             <groupId>org.gagravarr</groupId>
>             <artifactId>vorbis-java-core</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.gagravarr</groupId>
>             <artifactId>vorbis-java-tika</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.opennlp</groupId>
>             <artifactId>opennlp-tools</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.pdfbox</groupId>
>             <artifactId>pdfbox</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.pdfbox</groupId>
>             <artifactId>pdfbox-tools</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.poi</groupId>
>             <artifactId>poi-scratchpad</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>edu.ucar</groupId>
>             <artifactId>grib</artifactId>
>           </exclusion>  
>           <exclusion>
>             <groupId>com.googlecode.mp4parser</groupId>
>             <artifactId>isoparser</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>com.healthmarketscience.jackcess</groupId>
>             <artifactId>jackcess</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>com.healthmarketscience.jackcess</groupId>
>             <artifactId>jackcess-encrypt</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.sis.core</groupId>
>             <artifactId>sis-utility</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.sis.storage</groupId>
>             <artifactId>sis-netcdf</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.apache.sis.core</groupId>
>             <artifactId>sis-metadata</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.eclipse.rdf4j</groupId>
>             <artifactId>rdf4j-rio-trix</artifactId>
>           </exclusion>
>           <exclusion>
>             <groupId>org.yaml</groupId>
>             <artifactId>snakeyaml</artifactId>
>           </exclusion>        
>           <exclusion>
>             <groupId>org.eclipse.rdf4j</groupId>
>             <artifactId>rdf4j-rio-turtle</artifactId>
>           </exclusion>         
>         </exclusions>
>     </dependency>
> {code}
> Some background that may be useful from my notes:
> {code:java}
> Whilst adding Any23 the product, the Any23 Core package was causing Lintian 
> to fail.
> Lintian is a Debian package checker written in PERL. This package uses 
> Archive::Zip to unpack any .jar file in the Debian package. This particular 
> unzip utility does not handle the Zip64 format; causing the failure. The 
> original zip format has various restrictions, one of which being the number 
> of files in the archive. Therefore if the class files in the jar for the 
> product exceeds this limit (65535), then a zip64 format file is produced 
> instead of a standard zip file.
> The Any23 Core Library does seem quite excessive in what it pulls in. From 
> running the following, the output for the product goes from 40490 to 78513.
> zipinfo -1 product.jar | wc -l
> {code}
> This Linitan failure on a linux build was the original push for the 
> exclusions; however the product .jar also increased in a similar fashion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to