Order of fields within a Document in Lucene 2.4+

Matt Turner Thu, 25 Jun 2009 13:33:34 -0700

The Lucene FAQ says...
 
What is the order of fields returned by Document.fields()?
* Fields are returned in the same order they were added to the document. 
(now getFields() as fields is deprecated)
 
However I think this may no longer be the case in 2.4 
 
We are indexing documents in a specific order so that we can LOAD_AND_BREAK out 
of our FieldSelector as early as possible.
i.e. we have typically 50 indexed fields for a document, but when we are 
loading results with .doc(), we know we only need 4 of them.
 
So, our code ensures that these are added to the index first - and once the 4th 
field is loaded we break out of the selector.
 
This speeds us up by an order of magnitude.
 
 
 
However, we are finding that our field selector is processing fields in 
alphabetical order, not order of addition.  This means that we'd have to rename 
our fields to 'aaa..' in order to guarantee they'd be processed first.
 
 
I think, but am not sure, that this bit of code causes the problem (as spotted 
in http://www.mail-archive.com/[email protected]/msg24105.html).
It seems to have been introduced in version 2.4 (fields are in addition order 
in 2.3.2)
 
DocFieldProcessorPerThread.java:


   // If we are writing vectors then we must visit
   // fields in sorted order so they are written in
   // sorted order.  TODO: we actually only need to
   // sort the subset of fields that have vectors
   // enabled; we could save [small amount of] CPU
   // here.
   quickSort(fields, 0, fieldCount-1);

 
This appears to sort fields into alphabetical order.
 
Assuming that implementing the TODO would keep them in order of addition (and 
just keep vectors fields themselves sorted) - is it worth raising a JIRA to fix 
this ?
 
 
regards,
 
matt
 

 

_________________________________________________________________
Get the best of MSN on your mobile
http://clk.atdmt.com/UKM/go/147991039/direct/01/

Order of fields within a Document in Lucene 2.4+

Reply via email to