Re: results sorting

Winton Davies Tue, 19 Feb 2002 09:48:42 -0800

Yep, I can confirm this kind of approach works. If you truly have 
massive amounts of data and they are in memory, you might also want 
to consider making the documents a monolithic byte/char array and 
have a computed index (ie a roll ya own array). This was you avoid a 
lot of GC work in high transaction systems. Make the byte[] a static 
final and that memory should never get looked at by a GC.


Winton


>Doug,
>
>Thank you very much for you detailed response.  I'll give your suggestions a
>try.  I'm *very* impressed so far with lucene.  Performance is terrific.
>
>Kind Regards,
>
>Chris Opler
>
>Doug Cutting wrote:
>
>>  > From: Chris Opler [mailto:[EMAIL PROTECTED]]
>>  >
>>  > Am wondering if there is any facility to sort search hits by
>>  > fields in the
>>  > Document.
>>
>>  No, there's nothing like this built in to Lucene.
>>
>>  This can be very expensive with large collections, since it requires reading
>>  a Document object for every hit.  Reading a Document requires a
>>  random-access disk read.  And when someone includes a common word in a
>>  query, there can be lots of hits, far more than will ever be viewed by the
>>  user.
>>
>>  An exception is date sorting, which can be easily implemented using a
>>  HitCollector.  Documents are delivered to a hit collector in the order they
>>  were added to the index, so returning the oldest or most recent hits can be
>>  done without reading field values.  This is discussed more in:
>>    http://www.mail-archive.com/[email protected]/msg00228.html
>>  Someday this will be built into Lucene...
>>
>>  To implement efficient field sorting for a large collection you could
>>  construct a fast index of a field (e.g., an in-memory array) and then
>>  implement a HitCollector which uses this.  For example, you could construct
>>  an array of floats for a "price" field.  Then your hit collector could do
>>  something like:
>>    class MyCollector implements HitCollector {
>>      private float maxPrice = Float.MAX_VALUE;
>>      public final void collect(int doc, float score) {
>>        float price = prices[doc];
>>        if (price <= maxPrice) {
>>          hits.add(price, doc);
>>          if (hits.size() > maxHitCount) {
>>            hits.remove(hits.get(maxPrice));
>>            maxPrice = hits.lastKey();
>>          }
>>        }
>>      }
>>    }
>>
>>  Also, if your collection is small, you can probably afford to simply
>>  enumerate all hit documents and sort them as you wish.
>>
>>  Doug
>>
>>  --
>>  To unsubscribe, e-mail: 
>><mailto:[EMAIL PROTECTED]>
>>  For additional commands, e-mail: 
>><mailto:[EMAIL PROTECTED]>
>
>--
>=======================
>http://www.openwine.org
>
>
>
>--
>To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
>For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>


-- 

Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: results sorting

Reply via email to