[ 
https://issues.apache.org/jira/browse/LUCENE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Cowan updated LUCENE-1372:
-------------------------------

    Attachment: lucene-multisort.patch

Patch which deals with this in the case of Strings, with a test case. This is a 
POC example; if people are happy with the approach I'll implement for the other 
types (float, int, etc) as I think it makes sense there also.

> Proposal: introduce more sensible sorting when a doc has multiple values for 
> a term
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1372
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1372
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.2
>            Reporter: Paul Cowan
>            Priority: Minor
>         Attachments: lucene-multisort.patch
>
>
> At the moment, FieldCacheImpl has somewhat disconcerting values when sorting 
> on a field for which multiple values exist for one document. For example, 
> imagine a field "fruit" which is added to a document multiple times, with the 
> values as follows:
> doc 1: {"apple"}
> doc 2: {"banana"}
> doc 3: {"apple", "banana"}
> doc 4: {"apple", "zebra"}
> if one sorts on the field "fruit", the loop in 
> FieldCacheImpl.stringsIndexCache.createValue() (and similarly for the other 
> methods in the various FieldCacheImpl caches) does the following:
>           while (termDocs.next()) {
>             retArray[termDocs.doc()] = t;
>           }
> which means that we look over the terms in their natural order and, on each 
> one, overwrite retArray[doc] with the value for each document with that term. 
> Effectively, this overwriting means that a string sort in this circumstance 
> will sort by the LAST term lexicographically, so the docs above will 
> effecitvely be sorted as if they had the single values ("apple", "banana", 
> "banana", "zebra") which is nonintuitive. To change this to sort on the first 
> time in the TermEnum seems relatively trivial and low-overhead; while it's 
> not perfect (it's not local-aware, for example) the behaviour seems much more 
> sensible to me. Interested to see what people think.
> Patch to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to