Markus,

I’m confused about exactly what operations you’re performing - could you 
provide your field type?

In particular, I don’t understand why you can’t just rewrite the synonyms file 
entry

  word1 => word2

to:

  word1 => word1, word2

(Clearly I’m missing something about how stemming is involved.)

--
Steve
www.lucidworks.com

> On Dec 21, 2017, at 9:28 AM, Markus Jelsma <markus.jel...@openindex.io> wrote:
> 
> Hello Steve,
> 
> Well, that is an interesting approach to the topic indeed. But i do not think 
> it is possible to obtain a list of all inflected forms for all words that 
> also have roots in some synonym file, the stemmers are not reversible. 
> 
> Any other ideas?
> 
> Thanks,
> Markus
> 
> -----Original message-----
>> From:Steve Rowe <sar...@gmail.com>
>> Sent: Thursday 21st December 2017 0:10
>> To: solr-user@lucene.apache.org
>> Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter
>> 
>> Hi Markus,
>> 
>> My suggestion: rewrite your synonyms to include the triggering word in the 
>> expanded synonyms list.  That way you won’t need 
>> KeywordRepeat/RemoveDuplicates filters, and mm=100% will work as you expect.
>> 
>> I don’t think this situation is a bug, since mm applies to the built query, 
>> not to the original query terms.
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Dec 20, 2017, at 5:02 PM, Markus Jelsma <markus.jel...@openindex.io> 
>>> wrote:
>>> 
>>> Hello,
>>> 
>>> Yes of course, index time synonyms lessens the query time complexity and 
>>> will solve the mm problem. It also screws IDF and the flexibility of adding 
>>> synonyms on demand. The first we do not want, the second is impossible for 
>>> us (very large main search index).
>>> 
>>> We are looking for a solution with mm that takes KeywordRepeat, stemming 
>>> and synonym expansion into consideration. To me the current working of mm 
>>> in this case is a bug, i input one term so treat it as one term in mm, 
>>> regardless of expanded query terms.
>>> 
>>> Any query time ideas to share? I am not well versed with the actual code 
>>> dealing with this specific subject, the code doesn't like me. I am fine if 
>>> someone points me to the code that tells mm about the number of original 
>>> input terms, and what to do. If someone does, please also explain why the 
>>> change i want to make is a bad one, what to be aware of or what to beware 
>>> of, or what to take into account.
>>> 
>>> Also, am i the only one who regards this behaviour as a bug, or more 
>>> subtle, a weird unexpected behaviour?
>>> 
>>> Many many thanks!
>>> Markus
>>> 
>>> -----Original message-----
>>>> From:Shawn Heisey <apa...@elyograg.org>
>>>> Sent: Wednesday 20th December 2017 22:39
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter
>>>> 
>>>> On 12/19/2017 4:38 AM, Markus Jelsma wrote:
>>>>> I have an interesting issue with mm and SynonymQuery and 
>>>>> KeywordRepeatFilter. We do query time synonym expansion and use 
>>>>> KeywordRepeat for not only finding stemmed tokens. Our synonyms are 
>>>>> already preprocessed and contain only stemmed tokens. Synonym file 
>>>>> contains: traject,verbind
>>>>> 
>>>>> So, any non-root stem that ends up in a synonym is actually a search for 
>>>>> three terms: +DisjunctionMaxQuery(((title_nl:trajecten 
>>>>> Synonym(title_nl:traject title_nl:verbind))))
>>>>> 
>>>>> But, our default mm requires that two terms must match if the input query 
>>>>> consists of two terms: 2<-1 5<-2 6<90%
>>>>> 
>>>>> So, a simple query looking for a plural (trajecten) will not match a 
>>>>> document where the title contains only its singular form: q=trajecten 
>>>>> will not match document with title_nl:"een traject"
>>>> 
>>>> I would think that doing synonym expansion at index time would remove
>>>> any possible confusion about the number of terms at query time.  Queries
>>>> that involve synonyms will be slightly less complex, but the index would
>>>> be larger, so it's difficult to say whether those kinds of queries would
>>>> be any faster or not.
>>>> 
>>>> There is one clear disadvantage to index-time synonym expansion: If you
>>>> change your synonyms, you have to reindex.
>>>> 
>>>> Thanks,
>>>> Shawn
>>>> 
>>>> 
>> 
>> 

Reply via email to