[
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062606#comment-14062606
]
Sergey Beryozkin commented on TIKA-1367:
----------------------------------------
Thanks for the proposal, I'm not sure though it would help. Consider we have a
user not necessarily knowing what 'grep' is, for example someone working on
Windows. Ideally as a user I'd like to have an easy way to solve this typical
dependency issue: "My application will work with PDFs and OpenDocument docs
only, how can I get all but the relevant dependencies excluded ?". I know some
source and Maven based search can yield some info, but it would not something
every user can be expected be able to do.
For the record, here's what I see after grepping dependency:tree
{noformat}
[INFO] +- org.apache.tika:tika-core:jar:1.6-SNAPSHOT:compile
[INFO] +- org.gagravarr:vorbis-java-tika:jar:0.6:compile
[INFO] +- edu.ucar:netcdf:jar:4.2.20:compile
[INFO] | +- edu.ucar:unidataCommon:jar:4.2.20:compile
[INFO] | | \- net.jcip:jcip-annotations:jar:1.0:compile
[INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] | \- org.slf4j:slf4j-api:jar:1.6.1:compile
[INFO] +- net.sourceforge.jmatio:jmatio:jar:1.0:compile
[INFO] +- org.apache.james:apache-mime4j-core:jar:0.7.2:compile
[INFO] +- org.apache.james:apache-mime4j-dom:jar:0.7.2:compile
[INFO] +- org.apache.commons:commons-compress:jar:1.8:compile
[INFO] | \- org.tukaani:xz:jar:1.5:compile
[INFO] +- commons-codec:commons-codec:jar:1.5:compile
[INFO] +- org.apache.pdfbox:pdfbox:jar:1.8.6:compile
[INFO] | +- org.apache.pdfbox:fontbox:jar:1.8.6:compile
[INFO] | +- org.apache.pdfbox:jempbox:jar:1.8.6:compile
[INFO] | \- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] +- org.bouncycastle:bcmail-jdk15:jar:1.45:compile
[INFO] +- org.bouncycastle:bcprov-jdk15:jar:1.45:compile
[INFO] +- org.apache.poi:poi:jar:3.10-FINAL:compile
[INFO] +- org.apache.poi:poi-scratchpad:jar:3.10-FINAL:compile
[INFO] +- org.apache.poi:poi-ooxml:jar:3.10-FINAL:compile
[INFO] | +- org.apache.poi:poi-ooxml-schemas:jar:3.10-FINAL:compile
[INFO] | | \- org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile
[INFO] | \- dom4j:dom4j:jar:1.6.1:compile
[INFO] +- org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
[INFO] +- org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1:compile
[INFO] +- org.ow2.asm:asm-debug-all:jar:4.1:compile
[INFO] +- com.googlecode.mp4parser:isoparser:jar:1.0-RC-1:compile
[INFO] | \- org.aspectj:aspectjrt:jar:1.6.11:compile
[INFO] +- com.drewnoakes:metadata-extractor:jar:2.6.2:compile
[INFO] | +- com.adobe.xmp:xmpcore:jar:5.1.2:compile
[INFO] | \- xerces:xercesImpl:jar:2.8.1:compile
[INFO] | \- xml-apis:xml-apis:jar:1.3.03:compile
[INFO] +- de.l3s.boilerpipe:boilerpipe:jar:1.1.0:compile
[INFO] +- rome:rome:jar:1.0:compile
[INFO] | \- jdom:jdom:jar:1.0:compile
[INFO] +- org.gagravarr:vorbis-java-core:jar:0.6:compile
[INFO] +- com.googlecode.juniversalchardet:juniversalchardet:jar:1.0.3:compile
[INFO] +- com.uwyn:jhighlight:jar:1.0:compile
[INFO] +- com.pff:java-libpst:jar:0.8.1:compile
{noformat}
It's a difficult task to start excluding. I've no idea as a user what many of
those dependencies are for, and if some of them would be needed by all Parser
implementations or not. It's easy enough to spot what PDF Parser will need
(pdfbox), but more tricky to see what else might be needed for PDF as well as
for other types.
> Tika documentation should list tika-parsers parser dependencies
> ---------------------------------------------------------------
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
> Issue Type: Improvement
> Components: documentation
> Reporter: Sergey Beryozkin
> Fix For: 1.6
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven
> users of tika-parsers have to exclude all the transitivie dependencies
> manually. Documenting the list of the existing transitive dependencies and
> keeping the list up to date will help developers exclude the libraries not
> needed for a given project.
--
This message was sent by Atlassian JIRA
(v6.2#6252)