Re: More like this scoring algorithm unclear

Maarten Roosendaal Wed, 08 Jan 2014 10:20:45 -0800

scoring algorithm is still vague but i got the query to act like the API, 
although the results are different so i'm still doing it wrong, here's an 
example:
{
  "explain": true,
  "query": {
    "more_like_this": {
      "fields": [
        "PRODUCT_ID"
      ],
      "like_text": "1000004004855475 1001004002067765 1002004000094210 
1002004004499883",
      "min_term_freq": 1,
      "min_doc_freq": 1,
      "max_query_terms": 1,
      "percent_terms_to_match": 0.5
    }
  },
  "from": 0,
  "size": 50,
  "sort": [],
  "facets": {}
}


the like_text contains product_id's from a wishlist for which i want to 
find similair lists

Op woensdag 8 januari 2014 16:50:53 UTC+1 schreef Maarten Roosendaal:
>
> Hi,
>
> Thanks, i'm not quite sure how to do that. I'm using:
> http://localhost:9200/lists/list/[id of 
> list]/_mlt?mlt_field=product_id&min_term_freq=1&min_doc_freq=1
>
> the body does not seem to be respected (i'm using the elasticsearch head 
> plugin) if i ad:
> {
>   "explain": true
> }
>
> i've been trying to rewrite the mlt api as an mlt query but no luck so 
> far. Any suggestions?
>
> Thanks,
> Maarten
>
> Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher:
>>
>> Hey Maarten,
>>
>> I would use the "explain":true option to see just why your documents are 
>> being scored higher than others. MoreLikeThis using the same fulltext 
>> scoring as far as I know, so term position would affect score. 
>>
>>
>> http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html
>>
>> Justin
>>
>> On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:
>>>
>>> Hi,
>>>
>>> I have a question about why the 'more like this' algorithm scores 
>>> documents higher than others, while they are (at first glance) the same.
>>>
>>> What i've done is index wishlist-documents which contain 1 property: 
>>> product_id, this property contains an array of product_id's (e.g. [1234, 
>>> 4444, 5555, 6666]. What i'm trying to do is find similair wishlist for a 
>>> given wishlist with id x. The MLT API seems to work, it returns other 
>>> documents which contain at least 1 of the product_id's from the original 
>>> list.
>>>
>>> But what is see is that, for example. i get 10 hits, the first 6 hits 
>>> contain the same (and only 1) product_id, this product_id is present in the 
>>> original wishlist. What i would expect is that the score of the first 6 is 
>>> the same. However what i see is that only the first 2 have the same, the 
>>> next 2 a lower score and the next 2 even lower. Why is this?
>>>
>>> Also, i'm trying to write the MLT API as an MLT query, but somehow it 
>>> doesn't work. I would expect that i need to take the entire content of the 
>>> original product_id property and feed is as input for the 'like_text'. The 
>>> documentation is not very clear and doesn't provide examples so i'm a 
>>> little lost.
>>>
>>> Hope someone can give some pointers.
>>>
>>> Thanks,
>>> Maarten
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c7032391-2456-47a0-a3b8-1f5fe61127e7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: More like this scoring algorithm unclear

Reply via email to