+1 agreed for Tika and deprecating langdetect. I was surprised to see how hold langdetect was!
> On Mar 6, 2023, at 4:43 PM, Alessandro Benedetti <a.benede...@sease.io> wrote: > > +1 for delegating to Tika which is a much better place for that (and that > they are actively evolving). > > +1 for deprecating the old and not updated plugins as well (langdetect) > > Cheers > -------------------------- > *Alessandro Benedetti* > Director @ Sease Ltd. > *Apache Lucene/Solr Committer* > *Apache Solr PMC Member* > > e-mail: a.benede...@sease.io > > > *Sease* - Information Retrieval Applied > Consulting | Training | Open Source > > Website: Sease.io <http://sease.io/> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > <https://twitter.com/seaseltd> | Youtube > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github > <https://github.com/seaseltd> > > > On Thu, 2 Mar 2023 at 20:22, Jan Høydahl <jan....@cominvent.com> wrote: > >> Hi, >> >> Solr supports pluggable language detectors < >> https://solr.apache.org/guide/solr/latest/indexing-guide/language-detection.html >>> : >> >>> Solr supports three implementations of this feature: >>> >>> Tika’s language detection feature: >> https://tika.apache.org/1.28.4/detection.html >>> LangDetect language detection: >> https://github.com/shuyo/language-detection >>> OpenNLP language detection: >> http://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.langdetect >> >> Since our first implementation, the Tika project < >> https://tika.apache.org/2.7.0/detection.html#Language_Detection> has >> evolved it's language detection capabilities and added a pluggable >> architecture as well: >> https://github.com/apache/tika/tree/main/tika-langdetect >> >> One of Solr's langid plugins is "langdetect" which has not been updated in >> 10 years. I'd like to deprecate it and remove it in main for that reason. >> >> Longer term question: Does it make sense for us to keep maintaining our >> own set of language detectors in this landscape? >> We could re-purpose the langid module so that uses Tika's pluggable >> detectors in some way, perhaps with thin wrapper classes in Solr? >> >> Wdyt? >> >> Jan _______________________ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.