Re: [discovery] This quarter: researching new language analysers for search

Federico Leva (Nemo) Thu, 05 Jan 2017 01:34:23 -0800

David Causse, 05/01/2017 09:36:

Indeed from time to time I have to read lsearch2 code to understand what
was done before cirrus was deployed.

:)

Concerning Russian I think we do, apparently lsearchd used a simple
wrapper to the lucene russian stemmer [1]. If there are some other
custom code or if you are aware of some regressions I'd appreciate some
links so we can track them. I remember having seen some code (js
gadgets?) that does some custom russian stemming...

I remember seeing some file with long lists of rules for Cyrillic, butmaybe it was SerbianFilter.java .


Concerning Hebrew I hope we can find a good analyzer, according to the
comments in the code the hebrew analyzer that was tested appeared to be
unstable and was disabled.


Ah, makes sense. Thanks!

Nemo

_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Re: [discovery] This quarter: researching new language analysers for search

Reply via email to