[
https://issues.apache.org/jira/browse/SOLR-16010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492643#comment-17492643
]
Jan Høydahl commented on SOLR-16010:
------------------------------------
I just tested langid on main
{code:java}
SOLR_MODULES=langid solr start -c
bin/solr create -c test
curl -X POST -H 'Content-type:application/json' -d '{"add-updateprocessor":
{"name": "langid",
"class":
"org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory",
"langid.fl": "title",
"langid.langField":"language_s"}
}' http://localhost:8983/solr/test/config
# Post some docs{code}
This works, although the old TikaLanguageIdentifier is not very good, it needs
a lot of text to detect anything. LangDetectLanguageIdentifier is better. They
both work with our current 9.0 lib/ folder, so no need to further tika
dependencies or any dependency on extraction module.
> langid should include all required Tika dependencies
> ----------------------------------------------------
>
> Key: SOLR-16010
> URL: https://issues.apache.org/jira/browse/SOLR-16010
> Project: Solr
> Issue Type: Task
> Security Level: Public(Default Security Level. Issues are Public)
> Components: contrib - LangId
> Reporter: Kevin Risden
> Priority: Major
>
> Currently, the langid module requires that extraction module to be loaded for
> langid to work. It isn't clear if what is included in the extraction module
> will even meet the langid needs (ie: tika-langdetect isn't included in
> extraction module)
> {code:java}
> ➜ solr git:(SOLR-15989) find solr/packaging/build/solr-10.0.0-SNAPSHOT/
> -name '*tika*.jar'
> solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/langid/lib/tika-core-1.27.jar
> solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/tika-parsers-1.27.jar
> solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/tika-java7-1.27.jar
> solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/tika-xmp-1.27.jar
> solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/vorbis-java-tika-0.8.jar
> solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/tika-core-1.27.jar
> {code}
> This came out of a discussion in SOLR-15989 -
> https://github.com/apache/solr/pull/621#discussion_r806083202
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]