Hi Maarten,
Your 'like_text' is analyzed, the same way your 'product_id' field is
analyzed, unless specified by 'analyzer'. I would recommend setting
'percent_terms_to_match' to 0. However, if you are only searching over
product ids then a simple boolean query would do. If not, then I would
create a boolean query where each clause is a 'more like this field' for
each field of the queried document. This is actually what the mlt API does.
Cheers,
Alex
On Wednesday, January 8, 2014 7:20:05 PM UTC+1, Maarten Roosendaal wrote:
>
> scoring algorithm is still vague but i got the query to act like the API,
> although the results are different so i'm still doing it wrong, here's an
> example:
> {
> "explain": true,
> "query": {
> "more_like_this": {
> "fields": [
> "PRODUCT_ID"
> ],
> "like_text": "1000004004855475 1001004002067765 1002004000094210
> 1002004004499883",
> "min_term_freq": 1,
> "min_doc_freq": 1,
> "max_query_terms": 1,
> "percent_terms_to_match": 0.5
> }
> },
> "from": 0,
> "size": 50,
> "sort": [],
> "facets": {}
> }
>
> the like_text contains product_id's from a wishlist for which i want to
> find similair lists
>
> Op woensdag 8 januari 2014 16:50:53 UTC+1 schreef Maarten Roosendaal:
>>
>> Hi,
>>
>> Thanks, i'm not quite sure how to do that. I'm using:
>> http://localhost:9200/lists/list/[id of
>> list]/_mlt?mlt_field=product_id&min_term_freq=1&min_doc_freq=1
>>
>> the body does not seem to be respected (i'm using the elasticsearch head
>> plugin) if i ad:
>> {
>> "explain": true
>> }
>>
>> i've been trying to rewrite the mlt api as an mlt query but no luck so
>> far. Any suggestions?
>>
>> Thanks,
>> Maarten
>>
>> Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher:
>>>
>>> Hey Maarten,
>>>
>>> I would use the "explain":true option to see just why your documents are
>>> being scored higher than others. MoreLikeThis using the same fulltext
>>> scoring as far as I know, so term position would affect score.
>>>
>>>
>>> http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html
>>>
>>> Justin
>>>
>>> On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have a question about why the 'more like this' algorithm scores
>>>> documents higher than others, while they are (at first glance) the same.
>>>>
>>>> What i've done is index wishlist-documents which contain 1 property:
>>>> product_id, this property contains an array of product_id's (e.g. [1234,
>>>> 4444, 5555, 6666]. What i'm trying to do is find similair wishlist for a
>>>> given wishlist with id x. The MLT API seems to work, it returns other
>>>> documents which contain at least 1 of the product_id's from the original
>>>> list.
>>>>
>>>> But what is see is that, for example. i get 10 hits, the first 6 hits
>>>> contain the same (and only 1) product_id, this product_id is present in
>>>> the
>>>> original wishlist. What i would expect is that the score of the first 6 is
>>>> the same. However what i see is that only the first 2 have the same, the
>>>> next 2 a lower score and the next 2 even lower. Why is this?
>>>>
>>>> Also, i'm trying to write the MLT API as an MLT query, but somehow it
>>>> doesn't work. I would expect that i need to take the entire content of the
>>>> original product_id property and feed is as input for the 'like_text'. The
>>>> documentation is not very clear and doesn't provide examples so i'm a
>>>> little lost.
>>>>
>>>> Hope someone can give some pointers.
>>>>
>>>> Thanks,
>>>> Maarten
>>>>
>>>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/91734252-74d0-4001-becc-a184af0f2997%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.