Re: More like this scoring algorithm unclear

Alex Ksikes Tue, 06 May 2014 04:34:27 -0700

Hi Maarten,

Your 'like_text' is analyzed, the same way your 'product_id' field is 
analyzed, unless specified by 'analyzer'. I would recommend setting 
'percent_terms_to_match' to 0. However, if you are only searching over 
product ids then a simple boolean query would do. If not, then I would 
create a boolean query where each clause is a 'more like this field' for 
each field of the queried document. This is actually what the mlt API does.


Cheers,

Alex

On Wednesday, January 8, 2014 7:20:05 PM UTC+1, Maarten Roosendaal wrote:
>
> scoring algorithm is still vague but i got the query to act like the API, 
> although the results are different so i'm still doing it wrong, here's an 
> example:
> {
>   "explain": true,
>   "query": {
>     "more_like_this": {
>       "fields": [
>         "PRODUCT_ID"
>       ],
>       "like_text": "1000004004855475 1001004002067765 1002004000094210 
> 1002004004499883",
>       "min_term_freq": 1,
>       "min_doc_freq": 1,
>       "max_query_terms": 1,
>       "percent_terms_to_match": 0.5
>     }
>   },
>   "from": 0,
>   "size": 50,
>   "sort": [],
>   "facets": {}
> }
>
> the like_text contains product_id's from a wishlist for which i want to 
> find similair lists
>
> Op woensdag 8 januari 2014 16:50:53 UTC+1 schreef Maarten Roosendaal:
>>
>> Hi,
>>
>> Thanks, i'm not quite sure how to do that. I'm using:
>> http://localhost:9200/lists/list/[id of 
>> list]/_mlt?mlt_field=product_id&min_term_freq=1&min_doc_freq=1
>>
>> the body does not seem to be respected (i'm using the elasticsearch head 
>> plugin) if i ad:
>> {
>>   "explain": true
>> }
>>
>> i've been trying to rewrite the mlt api as an mlt query but no luck so 
>> far. Any suggestions?
>>
>> Thanks,
>> Maarten
>>
>> Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher:
>>>
>>> Hey Maarten,
>>>
>>> I would use the "explain":true option to see just why your documents are 
>>> being scored higher than others. MoreLikeThis using the same fulltext 
>>> scoring as far as I know, so term position would affect score. 
>>>
>>>
>>> http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html
>>>
>>> Justin
>>>
>>> On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have a question about why the 'more like this' algorithm scores 
>>>> documents higher than others, while they are (at first glance) the same.
>>>>
>>>> What i've done is index wishlist-documents which contain 1 property: 
>>>> product_id, this property contains an array of product_id's (e.g. [1234, 
>>>> 4444, 5555, 6666]. What i'm trying to do is find similair wishlist for a 
>>>> given wishlist with id x. The MLT API seems to work, it returns other 
>>>> documents which contain at least 1 of the product_id's from the original 
>>>> list.
>>>>
>>>> But what is see is that, for example. i get 10 hits, the first 6 hits 
>>>> contain the same (and only 1) product_id, this product_id is present in 
>>>> the 
>>>> original wishlist. What i would expect is that the score of the first 6 is 
>>>> the same. However what i see is that only the first 2 have the same, the 
>>>> next 2 a lower score and the next 2 even lower. Why is this?
>>>>
>>>> Also, i'm trying to write the MLT API as an MLT query, but somehow it 
>>>> doesn't work. I would expect that i need to take the entire content of the 
>>>> original product_id property and feed is as input for the 'like_text'. The 
>>>> documentation is not very clear and doesn't provide examples so i'm a 
>>>> little lost.
>>>>
>>>> Hope someone can give some pointers.
>>>>
>>>> Thanks,
>>>> Maarten
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/91734252-74d0-4001-becc-a184af0f2997%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: More like this scoring algorithm unclear

Reply via email to