DO NOT REPLY [Bug 28748] New: - Inconsistent behaviour sorting against field with no related documents

bugzilla Mon, 03 May 2004 12:47:13 -0700

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=28748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://issues.apache.org/bugzilla/show_bug.cgi?id=28748

Inconsistent behaviour sorting against field with no related documents

           Summary: Inconsistent behaviour sorting against field with no
                    related documents
           Product: Lucene
           Version: CVS Nightly - Specify date in submission
          Platform: Other
        OS/Version: Other
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: Search
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


In StringSortedHitQueue - generateSortIndex seems to mistake 
the TermEnum having values as indicating that the sort field 
has entries in the index.

In the case where the search has matching results an ArrayIndexOutOfBounds
exception is thrown in sortValue (line 177 StringSortedHitQueue)
as generateSortIndex creates a terms array of zero length and fieldOrder
contains 0 for all documents.

It would seem more helpful if:
a) generateSortIndex catches the lack of any documents with the sort field.

or

b) reserve terms[0] as a special value for documents that do not have
matching sort field values. ie Change the current implementation to add 1
to the index and change terms[0] to ensure it sorts "untagged" documents to
first or last.

For my application Id much prefer solution (b) as it allows much smaller 
indexes and make searching using sort values less brittle.

Thats the best my communication skills can muster just now. Could change
current code to something like:

private final int[] generateSortIndex()
throws IOException {

        final int[] retArray = new int[reader.maxDoc()];
        final String[] mterms = new String[reader.maxDoc() + 1];  // guess length
        if (retArray.length > 0) {
                TermDocs termDocs = reader.termDocs();
                // change this value to control if documents without sort field come 
first or last
                mterms[0] = "";  // XXXXXXXXX change
                int t = 1;  // current term number  XXXXXXXXXXXXX change
                try {
        

                        do {
                                Term term = enumerator.term();
                                if (term.field() != field) break;

                                // store term text
                                // we expect that there is at most one term per 
document
                                if (t >= mterms.length) throw new RuntimeException 
("there are more terms
than documents in field \""+field+"\"");
                                mterms[t] = term.text();

                                // store which documents use this term
                                termDocs.seek (enumerator);
                                while (termDocs.next()) {
                                        retArray[termDocs.doc()] = t;
                                }

                                t++;
                        } while (enumerator.next());
                } finally {
                        termDocs.close();
                }

                // if there are less terms than documents,
                // trim off the dead array space
                if (t < mterms.length) {
                        terms = new String[t];
                        System.arraycopy (mterms, 0, terms, 0, t);
                } else {
                        terms = mterms;
                }
        }
        return retArray;
}

Having very quick look at IntegerSortedHitQueue would seem possible
to do same thing. Maybe creating Integer wrapper objects once.

Hope that made some sort of sense. Im not very familiar with the code
or Lucene terminology.
If the above seems like a useful approach Id be glad to generate patches
for a cleaned up version.

Thanks

Sam

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 28748] New: - Inconsistent behaviour sorting against field with no related documents

Reply via email to