scoring algorithm is still vague but i got the query to act like the API,
although the results are different so i'm still doing it wrong, here's an
example:
{
"explain": true,
"query": {
"more_like_this": {
"fields": [
"PRODUCT_ID"
],
"like_text": "1000004004855475 1001004002067765 1002004000094210
1002004004499883",
"min_term_freq": 1,
"min_doc_freq": 1,
"max_query_terms": 1,
"percent_terms_to_match": 0.5
}
},
"from": 0,
"size": 50,
"sort": [],
"facets": {}
}
the like_text contains product_id's from a wishlist for which i want to
find similair lists
Op woensdag 8 januari 2014 16:50:53 UTC+1 schreef Maarten Roosendaal:
>
> Hi,
>
> Thanks, i'm not quite sure how to do that. I'm using:
> http://localhost:9200/lists/list/[id of
> list]/_mlt?mlt_field=product_id&min_term_freq=1&min_doc_freq=1
>
> the body does not seem to be respected (i'm using the elasticsearch head
> plugin) if i ad:
> {
> "explain": true
> }
>
> i've been trying to rewrite the mlt api as an mlt query but no luck so
> far. Any suggestions?
>
> Thanks,
> Maarten
>
> Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher:
>>
>> Hey Maarten,
>>
>> I would use the "explain":true option to see just why your documents are
>> being scored higher than others. MoreLikeThis using the same fulltext
>> scoring as far as I know, so term position would affect score.
>>
>>
>> http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html
>>
>> Justin
>>
>> On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:
>>>
>>> Hi,
>>>
>>> I have a question about why the 'more like this' algorithm scores
>>> documents higher than others, while they are (at first glance) the same.
>>>
>>> What i've done is index wishlist-documents which contain 1 property:
>>> product_id, this property contains an array of product_id's (e.g. [1234,
>>> 4444, 5555, 6666]. What i'm trying to do is find similair wishlist for a
>>> given wishlist with id x. The MLT API seems to work, it returns other
>>> documents which contain at least 1 of the product_id's from the original
>>> list.
>>>
>>> But what is see is that, for example. i get 10 hits, the first 6 hits
>>> contain the same (and only 1) product_id, this product_id is present in the
>>> original wishlist. What i would expect is that the score of the first 6 is
>>> the same. However what i see is that only the first 2 have the same, the
>>> next 2 a lower score and the next 2 even lower. Why is this?
>>>
>>> Also, i'm trying to write the MLT API as an MLT query, but somehow it
>>> doesn't work. I would expect that i need to take the entire content of the
>>> original product_id property and feed is as input for the 'like_text'. The
>>> documentation is not very clear and doesn't provide examples so i'm a
>>> little lost.
>>>
>>> Hope someone can give some pointers.
>>>
>>> Thanks,
>>> Maarten
>>>
>>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c7032391-2456-47a0-a3b8-1f5fe61127e7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.