[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062606#comment-14062606
 ] 

Sergey Beryozkin commented on TIKA-1367:
----------------------------------------

Thanks for the proposal, I'm not sure though it would help. Consider we have a 
user not necessarily knowing what 'grep' is, for example someone working on 
Windows. Ideally as a user I'd like to have an easy way to solve this typical 
dependency issue: "My application will work with PDFs and OpenDocument docs 
only, how can I get all but the relevant dependencies excluded ?". I know some 
source and Maven based search can yield some info, but it would not something 
every user can be expected be able to do. 
For the record, here's what I see after grepping dependency:tree

{noformat}
[INFO] +- org.apache.tika:tika-core:jar:1.6-SNAPSHOT:compile
[INFO] +- org.gagravarr:vorbis-java-tika:jar:0.6:compile
[INFO] +- edu.ucar:netcdf:jar:4.2.20:compile
[INFO] |  +- edu.ucar:unidataCommon:jar:4.2.20:compile
[INFO] |  |  \- net.jcip:jcip-annotations:jar:1.0:compile
[INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  \- org.slf4j:slf4j-api:jar:1.6.1:compile
[INFO] +- net.sourceforge.jmatio:jmatio:jar:1.0:compile
[INFO] +- org.apache.james:apache-mime4j-core:jar:0.7.2:compile
[INFO] +- org.apache.james:apache-mime4j-dom:jar:0.7.2:compile
[INFO] +- org.apache.commons:commons-compress:jar:1.8:compile
[INFO] |  \- org.tukaani:xz:jar:1.5:compile
[INFO] +- commons-codec:commons-codec:jar:1.5:compile
[INFO] +- org.apache.pdfbox:pdfbox:jar:1.8.6:compile
[INFO] |  +- org.apache.pdfbox:fontbox:jar:1.8.6:compile
[INFO] |  +- org.apache.pdfbox:jempbox:jar:1.8.6:compile
[INFO] |  \- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] +- org.bouncycastle:bcmail-jdk15:jar:1.45:compile
[INFO] +- org.bouncycastle:bcprov-jdk15:jar:1.45:compile
[INFO] +- org.apache.poi:poi:jar:3.10-FINAL:compile
[INFO] +- org.apache.poi:poi-scratchpad:jar:3.10-FINAL:compile
[INFO] +- org.apache.poi:poi-ooxml:jar:3.10-FINAL:compile
[INFO] |  +- org.apache.poi:poi-ooxml-schemas:jar:3.10-FINAL:compile
[INFO] |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile
[INFO] |  \- dom4j:dom4j:jar:1.6.1:compile
[INFO] +- org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
[INFO] +- org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1:compile
[INFO] +- org.ow2.asm:asm-debug-all:jar:4.1:compile
[INFO] +- com.googlecode.mp4parser:isoparser:jar:1.0-RC-1:compile
[INFO] |  \- org.aspectj:aspectjrt:jar:1.6.11:compile
[INFO] +- com.drewnoakes:metadata-extractor:jar:2.6.2:compile
[INFO] |  +- com.adobe.xmp:xmpcore:jar:5.1.2:compile
[INFO] |  \- xerces:xercesImpl:jar:2.8.1:compile
[INFO] |     \- xml-apis:xml-apis:jar:1.3.03:compile
[INFO] +- de.l3s.boilerpipe:boilerpipe:jar:1.1.0:compile
[INFO] +- rome:rome:jar:1.0:compile
[INFO] |  \- jdom:jdom:jar:1.0:compile
[INFO] +- org.gagravarr:vorbis-java-core:jar:0.6:compile
[INFO] +- com.googlecode.juniversalchardet:juniversalchardet:jar:1.0.3:compile
[INFO] +- com.uwyn:jhighlight:jar:1.0:compile
[INFO] +- com.pff:java-libpst:jar:0.8.1:compile

{noformat}

It's a difficult task to start excluding. I've no idea as a user what many of 
those dependencies are for, and if some of them would be needed by all Parser 
implementations or not. It's easy enough to spot what PDF Parser will need 
(pdfbox), but more tricky to see what else might be needed for PDF as well as 
for other types.

> Tika documentation should list tika-parsers parser dependencies
> ---------------------------------------------------------------
>
>                 Key: TIKA-1367
>                 URL: https://issues.apache.org/jira/browse/TIKA-1367
>             Project: Tika
>          Issue Type: Improvement
>          Components: documentation
>            Reporter: Sergey Beryozkin
>             Fix For: 1.6
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to