[
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286033#comment-14286033
]
Hoss Man commented on SOLR-6991:
--------------------------------
TIKA-93 introduced the TesseractOCRParser, and TIKA-1476 enabled it as a
default parser.
that combination means that the first time Tika is used in Solr, the
TesseractOCRParser will be checked to see if the system "hasTesseract"
installed to know if that parser should be consulted -- and when that happens,
ExternalParser.check is used which calls Runtime.exec and blows up in turkish
locale.
----
possible resolutions i can think of:
* change how we init Tika to prevent this parser from ever being used (override
the list of autodeteced parsers?)
* change how we include tika jars/defaults to prevent this parser from ever
being used (override the default tesseract properties file in the jar somehow
maybe?)
* rollback to tika 1.6
* punt and advise turkish users to run their jvm in en_US ?
> Update to Apache TIKA 1.7
> -------------------------
>
> Key: SOLR-6991
> URL: https://issues.apache.org/jira/browse/SOLR-6991
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Fix For: 5.0, Trunk, 5.1
>
> Attachments: SOLR-6991.patch, SOLR-6991.patch
>
>
> Apache TIKA 1.7 was released:
> [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
> This is more or less a dependency update, so replacements. Not sure if we
> should do this for 5.0. In 5.0 we currently have the previous version, which
> was not yet released with Solr. If we now bring this into 5.0, we wouldn't
> have a new release 2 times. I can change the stuff this evening and let it
> bake in 5.x, so maybe we backport this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]