[ https://issues.apache.org/jira/browse/LUCENE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628513#action_12628513 ]
Paul Smith commented on LUCENE-1372: ------------------------------------ bq. I'm not following this argument. Will it be less silly when {zebra,apple} sorts before {banana} ? Well, at the presentation layer I don't think you'd present it like that (we don't). We'd sort the list of attributes so that it would appear as "apple,zebra". > Proposal: introduce more sensible sorting when a doc has multiple values for > a term > ----------------------------------------------------------------------------------- > > Key: LUCENE-1372 > URL: https://issues.apache.org/jira/browse/LUCENE-1372 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.3.2 > Reporter: Paul Cowan > Priority: Minor > Attachments: lucene-multisort.patch > > > At the moment, FieldCacheImpl has somewhat disconcerting values when sorting > on a field for which multiple values exist for one document. For example, > imagine a field "fruit" which is added to a document multiple times, with the > values as follows: > doc 1: {"apple"} > doc 2: {"banana"} > doc 3: {"apple", "banana"} > doc 4: {"apple", "zebra"} > if one sorts on the field "fruit", the loop in > FieldCacheImpl.stringsIndexCache.createValue() (and similarly for the other > methods in the various FieldCacheImpl caches) does the following: > while (termDocs.next()) { > retArray[termDocs.doc()] = t; > } > which means that we look over the terms in their natural order and, on each > one, overwrite retArray[doc] with the value for each document with that term. > Effectively, this overwriting means that a string sort in this circumstance > will sort by the LAST term lexicographically, so the docs above will > effecitvely be sorted as if they had the single values ("apple", "banana", > "banana", "zebra") which is nonintuitive. To change this to sort on the first > time in the TermEnum seems relatively trivial and low-overhead; while it's > not perfect (it's not local-aware, for example) the behaviour seems much more > sensible to me. Interested to see what people think. > Patch to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]