Hi, Solr supports pluggable language detectors <https://solr.apache.org/guide/solr/latest/indexing-guide/language-detection.html>:
> Solr supports three implementations of this feature: > > Tika’s language detection feature: > https://tika.apache.org/1.28.4/detection.html > LangDetect language detection: https://github.com/shuyo/language-detection > OpenNLP language detection: > http://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.langdetect Since our first implementation, the Tika project <https://tika.apache.org/2.7.0/detection.html#Language_Detection> has evolved it's language detection capabilities and added a pluggable architecture as well: https://github.com/apache/tika/tree/main/tika-langdetect One of Solr's langid plugins is "langdetect" which has not been updated in 10 years. I'd like to deprecate it and remove it in main for that reason. Longer term question: Does it make sense for us to keep maintaining our own set of language detectors in this landscape? We could re-purpose the langid module so that uses Tika's pluggable detectors in some way, perhaps with thin wrapper classes in Solr? Wdyt? Jan