Tim Allison created SOLR-11721:
----------------------------------
Summary: Isolate Tika and dependencies into separate jvm
Key: SOLR-11721
URL: https://issues.apache.org/jira/browse/SOLR-11721
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Tim Allison
Tika should not be run in the same jvm as Solr. Ever.
Upgrading Tika and hoping to avoid jar hell, while getting all of the
dependencies right manually is, um, error prone. See my recent failure:
SOLR-11622, for which I apologize profusely.
Running DIH against Tika's unit test documents has been eye-opening. It has
revealed some other version conflict/dependency failures that should have been
caught much earlier.
The fix is non-trivial, but we should work towards it.
I see two options:
1. TIKA-2514 -- Our current ForkParser offers a model for a minimal fork
process + server option. The limitation currently is that all parsers and
dependencies must be serializable, which can be a problem for users adding
their own parsers with deps that might not be designed for serializability.
The proposal there is to rework the ForkParser to use a TIKA_HOME directory for
all dependencies.
2. SOLR-7632 -- use tika-server, but make it seamless and as easy (and secure!)
to use as the current handlers.
Other thoughts, recommendations?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]