More like this scoring algorithm unclear

Maarten Roosendaal Wed, 08 Jan 2014 00:05:14 -0800

Hi,

I have a question about why the 'more like this' algorithm scores documents 
higher than others, while they are (at first glance) the same.

What i've done is index wishlist-documents which contain 1 property:
product_id, this property contains an array of product_id's (e.g. [1234,
4444, 5555, 6666]. What i'm trying to do is find similair wishlist for a
given wishlist with id x. The MLT API seems to work, it returns other
documents which contain at least 1 of the product_id's from the original
list.

But what is see is that, for example. i get 10 hits, the first 6 hits
contain the same (and only 1) product_id, this product_id is present in the
original wishlist. What i would expect is that the score of the first 6 is
the same. However what i see is that only the first 2 have the same, the
next 2 a lower score and the next 2 even lower. Why is this?

Also, i'm trying to write the MLT API as an MLT query, but somehow it
doesn't work. I would expect that i need to take the entire content of the
original product_id property and feed is as input for the 'like_text'. The
documentation is not very clear and doesn't provide examples so i'm a
little lost.

Hope someone can give some pointers.

Thanks,
Maarten

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0e2827b2-5a21-4cff-b773-ebdd861c5972%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

More like this scoring algorithm unclear

Reply via email to