[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286307#comment-14286307
 ] 

Uwe Schindler commented on SOLR-6991:
-------------------------------------

One trick could work:
TIKA prefers always "external" parsers loaded by SPI. The trick here would be 
to add a /META-INF/services/... file that lists a subclass of the Tesseract 
parser that just always returns "no supported media types". TIKA would use our 
subclass in preference to the one shipped. By that we could disable the parser. 
I have not checked this, but this would be another hack (that I don't like, 
too).

> Update to Apache TIKA 1.7
> -------------------------
>
>                 Key: SOLR-6991
>                 URL: https://issues.apache.org/jira/browse/SOLR-6991
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 5.0, Trunk, 5.1
>
>         Attachments: SOLR-6991-forkfix.patch, SOLR-6991.patch, SOLR-6991.patch
>
>
> Apache TIKA 1.7 was released: 
> [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
> This is more or less a dependency update, so replacements. Not sure if we 
> should do this for 5.0. In 5.0 we currently have the previous version, which 
> was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
> have a new release 2 times. I can change the stuff this evening and let it 
> bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to