I should also note that the custom scoring is not the problem: it exhibits 
a preference for documents with many values for contributor.name even when 
the custom scoring is removed.

On Wednesday, June 25, 2014 7:29:26 PM UTC-7, shane wrote:
>
> I've got some indexed documents with some data that looks like this:
>
>         "_source":{
>                "title":"The Fault in Our Stars",
>                "contributors":{
>                   "name":"John Green",
>                }
> }
>
> Then I've got other documents with multiple contributors:
>
>             "_source":{
>                "title":"Horror for Good: A Charitable Anthology",
>                "contributors":{
>                   "name":[
>                      "Joe R Lansdale",
>                      "Ray Garton",
>                      "F. Paul Wilson",
>                      "Ian Harding",
>                      "Shaun Hutson",
>                      "Jeff Strand",
>                      "Jack Ketchum",
>                      "Wrath James White",
>                      "Monica J. O'Rourke",
>                      "Lisa Morton",
>                      "Laird Barron",
>                      "Joe McKinney",
>                      "Richard Salter",
>                      "Thomas Lee",
>                      "Gary McMahon",
>                      "Taylor Grant",
>                      "Lorne Dixon",
>                      "Nate Southard",
>                      "Tracie McBride",
>                      "Robert S Wilson",
>                      "John Mantooth",
>                      "G.N. Braun",
>                      "John F D Taff",
>                      "Benjamin Kane Ethridge",
>                      "Stephen Bacon",
>                      "Steven W Booth",
>                      "Brad C. Hodson",
>                      "Jonathon Templar",
>                      "Mark Scioneaux",
>                      "R.J. Cavender",
>                      "Norman L. Rubenstein",
>                      "Danica Green",
>                      "G.R. Yeates",
>                      "Boyd E. Harris",
>                      "Rena Mason"
>                   ],
>
>                }
>
> (This is simplified data, but I've included the relevant parts.)
>
> The issue is when I do a multi match query on contributors.name. Take the 
> query "john green" as an example: I always get higher scoring for documents 
> like the second one, even though "john" and "green" aren't at all close to 
> each other. 
>
> If I do a multi match query with type=phrase that helps, but a document 
> with John Green as one of several authors always comes up at the top of the 
> list, beating out any document with John Green as the only author, and I 
> have custom scoring in place that should cause another particular document 
> to score higher. Also, I don't want to do phrase matching because I want to 
> be able to do a cross_fields query across title and contributor.name, so 
> that queries like "the fault in our stars john green" work.
>
> So why are arrays with many values preferred over the single value in this 
> case? And how can I index or query this data to avoid this problem?
>
> This is on ES 1.1.1.
>
> Thanks,
>
> Shane
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8b4fd820-af8b-42df-ac2d-81ac5a51e7af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to