+1 agreed for Tika and deprecating langdetect. 

I was surprised to see how hold langdetect was!


> On Mar 6, 2023, at 4:43 PM, Alessandro Benedetti <a.benede...@sease.io> wrote:
> 
> +1 for delegating to Tika which is a much better place for that (and that
> they are actively evolving).
> 
> +1 for deprecating the old and not updated plugins as well (langdetect)
> 
> Cheers
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
> 
> e-mail: a.benede...@sease.io
> 
> 
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
> 
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
> 
> 
> On Thu, 2 Mar 2023 at 20:22, Jan Høydahl <jan....@cominvent.com> wrote:
> 
>> Hi,
>> 
>> Solr supports pluggable language detectors <
>> https://solr.apache.org/guide/solr/latest/indexing-guide/language-detection.html
>>> :
>> 
>>> Solr supports three implementations of this feature:
>>> 
>>> Tika’s language detection feature:
>> https://tika.apache.org/1.28.4/detection.html
>>> LangDetect language detection:
>> https://github.com/shuyo/language-detection
>>> OpenNLP language detection:
>> http://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.langdetect
>> 
>> Since our first implementation, the Tika project <
>> https://tika.apache.org/2.7.0/detection.html#Language_Detection> has
>> evolved it's language detection capabilities and added a pluggable
>> architecture as well:
>> https://github.com/apache/tika/tree/main/tika-langdetect
>> 
>> One of Solr's langid plugins is "langdetect" which has not been updated in
>> 10 years. I'd like to deprecate it and remove it in main for that reason.
>> 
>> Longer term question: Does it make sense for us to keep maintaining our
>> own set of language detectors in this landscape?
>> We could re-purpose the langid module so that uses Tika's pluggable
>> detectors in some way, perhaps with thin wrapper classes in Solr?
>> 
>> Wdyt?
>> 
>> Jan

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Reply via email to