[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074175#comment-16074175
 ] 

Gus Heck commented on TIKA-1367:
--------------------------------

So when the dust settles here, how will one build a coherent, workable one-jar 
application that supports code like this that intends to make a best effort to 
parse any document that might be encountered:

{code}
      Tika tika = new Tika();
      tika.setMaxStringLength(document.getRawData().length);
      Metadata metadata = new Metadata();
      try (ByteArrayInputStream bais = new ByteArrayInputStream(rawData)) {
        String textContent = tika.parseToString(bais, metadata);
        document.setRawData(textContent.getBytes(Charset.forName("UTF-8")));
        for (String name : metadata.names()) {
          document.put(sanitize(name) + plusSuffix(), metadata.get(name));
        }
      } catch (IOException | TikaException e) {
        log.warn("Tika processing failure!", e);
        // if tika can't parse it we certainly don't want random binary crap in 
the index
        document.setStatus(Status.DROPPED);
      }
{code}

Although I notice that this is not marked as fixed yet, in 1.15, the above code 
no-longer compiles... (and somehow there are no dependencies reported by 
gradle...)
{code}

compile - Dependencies for source set 'main'.
+--- org.apache.tika:tika-parsers:1.15
+--- org.apache.solr:solr-solrj:5.5.0
|    +--- commons-io:commons-io:2.4
{code}
vs
{code}
+--- org.apache.tika:tika-parsers:1.12
|    +--- org.apache.tika:tika-core:1.12
|    +--- org.gagravarr:vorbis-java-tika:0.6
|    |    \--- org.apache.tika:tika-core:1.5 -> 1.12
|    +--- com.healthmarketscience.jackcess:jackcess:2.1.2
{code}

Which seems very much like it's totally going to break everything... if gradle 
doesn't see the deps, one-jar won't package them (all I did was change a 1.12 
to a 1.15 in the gradle build to cause this)

> Tika documentation should list tika-parsers parser dependencies
> ---------------------------------------------------------------
>
>                 Key: TIKA-1367
>                 URL: https://issues.apache.org/jira/browse/TIKA-1367
>             Project: Tika
>          Issue Type: Improvement
>          Components: documentation
>            Reporter: Sergey Beryozkin
>             Fix For: 1.16
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to