Markus, I’m confused about exactly what operations you’re performing - could you provide your field type?
In particular, I don’t understand why you can’t just rewrite the synonyms file entry word1 => word2 to: word1 => word1, word2 (Clearly I’m missing something about how stemming is involved.) -- Steve www.lucidworks.com > On Dec 21, 2017, at 9:28 AM, Markus Jelsma <markus.jel...@openindex.io> wrote: > > Hello Steve, > > Well, that is an interesting approach to the topic indeed. But i do not think > it is possible to obtain a list of all inflected forms for all words that > also have roots in some synonym file, the stemmers are not reversible. > > Any other ideas? > > Thanks, > Markus > > -----Original message----- >> From:Steve Rowe <sar...@gmail.com> >> Sent: Thursday 21st December 2017 0:10 >> To: solr-user@lucene.apache.org >> Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter >> >> Hi Markus, >> >> My suggestion: rewrite your synonyms to include the triggering word in the >> expanded synonyms list. That way you won’t need >> KeywordRepeat/RemoveDuplicates filters, and mm=100% will work as you expect. >> >> I don’t think this situation is a bug, since mm applies to the built query, >> not to the original query terms. >> >> -- >> Steve >> www.lucidworks.com >> >>> On Dec 20, 2017, at 5:02 PM, Markus Jelsma <markus.jel...@openindex.io> >>> wrote: >>> >>> Hello, >>> >>> Yes of course, index time synonyms lessens the query time complexity and >>> will solve the mm problem. It also screws IDF and the flexibility of adding >>> synonyms on demand. The first we do not want, the second is impossible for >>> us (very large main search index). >>> >>> We are looking for a solution with mm that takes KeywordRepeat, stemming >>> and synonym expansion into consideration. To me the current working of mm >>> in this case is a bug, i input one term so treat it as one term in mm, >>> regardless of expanded query terms. >>> >>> Any query time ideas to share? I am not well versed with the actual code >>> dealing with this specific subject, the code doesn't like me. I am fine if >>> someone points me to the code that tells mm about the number of original >>> input terms, and what to do. If someone does, please also explain why the >>> change i want to make is a bad one, what to be aware of or what to beware >>> of, or what to take into account. >>> >>> Also, am i the only one who regards this behaviour as a bug, or more >>> subtle, a weird unexpected behaviour? >>> >>> Many many thanks! >>> Markus >>> >>> -----Original message----- >>>> From:Shawn Heisey <apa...@elyograg.org> >>>> Sent: Wednesday 20th December 2017 22:39 >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter >>>> >>>> On 12/19/2017 4:38 AM, Markus Jelsma wrote: >>>>> I have an interesting issue with mm and SynonymQuery and >>>>> KeywordRepeatFilter. We do query time synonym expansion and use >>>>> KeywordRepeat for not only finding stemmed tokens. Our synonyms are >>>>> already preprocessed and contain only stemmed tokens. Synonym file >>>>> contains: traject,verbind >>>>> >>>>> So, any non-root stem that ends up in a synonym is actually a search for >>>>> three terms: +DisjunctionMaxQuery(((title_nl:trajecten >>>>> Synonym(title_nl:traject title_nl:verbind)))) >>>>> >>>>> But, our default mm requires that two terms must match if the input query >>>>> consists of two terms: 2<-1 5<-2 6<90% >>>>> >>>>> So, a simple query looking for a plural (trajecten) will not match a >>>>> document where the title contains only its singular form: q=trajecten >>>>> will not match document with title_nl:"een traject" >>>> >>>> I would think that doing synonym expansion at index time would remove >>>> any possible confusion about the number of terms at query time. Queries >>>> that involve synonyms will be slightly less complex, but the index would >>>> be larger, so it's difficult to say whether those kinds of queries would >>>> be any faster or not. >>>> >>>> There is one clear disadvantage to index-time synonym expansion: If you >>>> change your synonyms, you have to reindex. >>>> >>>> Thanks, >>>> Shawn >>>> >>>> >> >>