I should also note that the custom scoring is not the problem: it exhibits
a preference for documents with many values for contributor.name even when
the custom scoring is removed.
On Wednesday, June 25, 2014 7:29:26 PM UTC-7, shane wrote:
>
> I've got some indexed documents with some data that looks like this:
>
> "_source":{
> "title":"The Fault in Our Stars",
> "contributors":{
> "name":"John Green",
> }
> }
>
> Then I've got other documents with multiple contributors:
>
> "_source":{
> "title":"Horror for Good: A Charitable Anthology",
> "contributors":{
> "name":[
> "Joe R Lansdale",
> "Ray Garton",
> "F. Paul Wilson",
> "Ian Harding",
> "Shaun Hutson",
> "Jeff Strand",
> "Jack Ketchum",
> "Wrath James White",
> "Monica J. O'Rourke",
> "Lisa Morton",
> "Laird Barron",
> "Joe McKinney",
> "Richard Salter",
> "Thomas Lee",
> "Gary McMahon",
> "Taylor Grant",
> "Lorne Dixon",
> "Nate Southard",
> "Tracie McBride",
> "Robert S Wilson",
> "John Mantooth",
> "G.N. Braun",
> "John F D Taff",
> "Benjamin Kane Ethridge",
> "Stephen Bacon",
> "Steven W Booth",
> "Brad C. Hodson",
> "Jonathon Templar",
> "Mark Scioneaux",
> "R.J. Cavender",
> "Norman L. Rubenstein",
> "Danica Green",
> "G.R. Yeates",
> "Boyd E. Harris",
> "Rena Mason"
> ],
>
> }
>
> (This is simplified data, but I've included the relevant parts.)
>
> The issue is when I do a multi match query on contributors.name. Take the
> query "john green" as an example: I always get higher scoring for documents
> like the second one, even though "john" and "green" aren't at all close to
> each other.
>
> If I do a multi match query with type=phrase that helps, but a document
> with John Green as one of several authors always comes up at the top of the
> list, beating out any document with John Green as the only author, and I
> have custom scoring in place that should cause another particular document
> to score higher. Also, I don't want to do phrase matching because I want to
> be able to do a cross_fields query across title and contributor.name, so
> that queries like "the fault in our stars john green" work.
>
> So why are arrays with many values preferred over the single value in this
> case? And how can I index or query this data to avoid this problem?
>
> This is on ES 1.1.1.
>
> Thanks,
>
> Shane
>
>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8b4fd820-af8b-42df-ac2d-81ac5a51e7af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.