[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286111#comment-14286111
 ] 

Hoss Man commented on SOLR-6991:
--------------------------------

bq. In fact this is not TIKA's issue and not new, a lot of stuff around Hadoop 
in Solr fails with Turkish!

...my point is: it's new to Solr.

in all other cases where POSIX_SPAWN impacts Solr, we either:
* deal with it in the solr code, so we give a meaningful error to the user 
explaining the problem (ie: SystemInfoHandler)
* it's in an optional feature that *NEVER* worked with turkish -- ie: the 
hadoop / morephlines contribs, from the first version it was available in Solr, 
would not work with turkish locale

...in this case, we're talking about an _existing_ solr feature, that has 
previously worked fine if you run older Solr with turkish, and now when 
upgrading to 5.0 you're going to get a weird error message.

if there's nothing better we can do keep the ExtractionRequestHandler working 
or users who upgrade (even if they run with turkish) then i'm fine with assumes 
in the tests and notes in the docs ... i was just hoping you'd have a better 
idea.

in particular: I'm still wondering if we can leverage the classpath in a way to 
override the "default" TesseractOCRConfig.properties file in the tika-parsers 
jar with our own version that prevents tesseract from being used.  (i agree 
it's not worth switching to explicitly whitelisting the parsers in Solr code, 
but is there an easy way to blacklist this parser and/or other parsers we know 
are problematic?)


> Update to Apache TIKA 1.7
> -------------------------
>
>                 Key: SOLR-6991
>                 URL: https://issues.apache.org/jira/browse/SOLR-6991
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 5.0, Trunk, 5.1
>
>         Attachments: SOLR-6991-forkfix.patch, SOLR-6991.patch, SOLR-6991.patch
>
>
> Apache TIKA 1.7 was released: 
> [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
> This is more or less a dependency update, so replacements. Not sure if we 
> should do this for 5.0. In 5.0 we currently have the previous version, which 
> was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
> have a new release 2 times. I can change the stuff this evening and let it 
> bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to