Proposal: introduce more sensible sorting when a doc has multiple values for a 
term
-----------------------------------------------------------------------------------

                 Key: LUCENE-1372
                 URL: https://issues.apache.org/jira/browse/LUCENE-1372
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
    Affects Versions: 2.3.2
            Reporter: Paul Cowan
            Priority: Minor


At the moment, FieldCacheImpl has somewhat disconcerting values when sorting on 
a field for which multiple values exist for one document. For example, imagine 
a field "fruit" which is added to a document multiple times, with the values as 
follows:

doc 1: {"apple"}
doc 2: {"banana"}
doc 3: {"apple", "banana"}
doc 4: {"apple", "zebra"}

if one sorts on the field "fruit", the loop in 
FieldCacheImpl.stringsIndexCache.createValue() (and similarly for the other 
methods in the various FieldCacheImpl caches) does the following:

          while (termDocs.next()) {
            retArray[termDocs.doc()] = t;
          }

which means that we look over the terms in their natural order and, on each 
one, overwrite retArray[doc] with the value for each document with that term. 
Effectively, this overwriting means that a string sort in this circumstance 
will sort by the LAST term lexicographically, so the docs above will 
effecitvely be sorted as if they had the single values ("apple", "banana", 
"banana", "zebra") which is nonintuitive. To change this to sort on the first 
time in the TermEnum seems relatively trivial and low-overhead; while it's not 
perfect (it's not local-aware, for example) the behaviour seems much more 
sensible to me. Interested to see what people think.

Patch to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to