Kevin Risden created SOLR-16010:
-----------------------------------
Summary: langid should include all required Tika dependencies
Key: SOLR-16010
URL: https://issues.apache.org/jira/browse/SOLR-16010
Project: Solr
Issue Type: Task
Security Level: Public (Default Security Level. Issues are Public)
Components: contrib - LangId
Reporter: Kevin Risden
Currently, the langid module requires that extraction module to be loaded for
langid to work. It isn't clear if what is included in the extraction module
will even meet the langid needs (ie: tika-langdetect isn't included in
extraction module)
{code:java}
➜ solr git:(SOLR-15989) find solr/packaging/build/solr-10.0.0-SNAPSHOT/ -name
'*tika*.jar'
solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/langid/lib/tika-core-1.27.jar
solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/tika-parsers-1.27.jar
solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/tika-java7-1.27.jar
solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/tika-xmp-1.27.jar
solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/vorbis-java-tika-0.8.jar
solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/tika-core-1.27.jar
{code}
This came out of a discussion in SOLR-15989 -
https://github.com/apache/solr/pull/621#discussion_r806083202
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]