[ https://issues.apache.org/jira/browse/LUCENE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Cowan updated LUCENE-1372: ------------------------------- Attachment: lucene-multisort.patch Patch which deals with this in the case of Strings, with a test case. This is a POC example; if people are happy with the approach I'll implement for the other types (float, int, etc) as I think it makes sense there also. > Proposal: introduce more sensible sorting when a doc has multiple values for > a term > ----------------------------------------------------------------------------------- > > Key: LUCENE-1372 > URL: https://issues.apache.org/jira/browse/LUCENE-1372 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.3.2 > Reporter: Paul Cowan > Priority: Minor > Attachments: lucene-multisort.patch > > > At the moment, FieldCacheImpl has somewhat disconcerting values when sorting > on a field for which multiple values exist for one document. For example, > imagine a field "fruit" which is added to a document multiple times, with the > values as follows: > doc 1: {"apple"} > doc 2: {"banana"} > doc 3: {"apple", "banana"} > doc 4: {"apple", "zebra"} > if one sorts on the field "fruit", the loop in > FieldCacheImpl.stringsIndexCache.createValue() (and similarly for the other > methods in the various FieldCacheImpl caches) does the following: > while (termDocs.next()) { > retArray[termDocs.doc()] = t; > } > which means that we look over the terms in their natural order and, on each > one, overwrite retArray[doc] with the value for each document with that term. > Effectively, this overwriting means that a string sort in this circumstance > will sort by the LAST term lexicographically, so the docs above will > effecitvely be sorted as if they had the single values ("apple", "banana", > "banana", "zebra") which is nonintuitive. To change this to sort on the first > time in the TermEnum seems relatively trivial and low-overhead; while it's > not perfect (it's not local-aware, for example) the behaviour seems much more > sensible to me. Interested to see what people think. > Patch to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]