On Dec 14, 2009, at 6:00 PM, Niclas Rothman wrote:

> Hi there,
> Perhaps this is far out but I need to get some advice on the following 
> problem.
> 
> We use Lucene what it is really good to, to find documents by "relevance".
> After a search have been done and I have the hits in my hands, I need to do 
> some heavy sorting on this list where the data about sorting is stored in the 
> database, not in the lucene index.
> Therefore I need to get all document ids for a search so I can fetch the 
> needed data from the database and afterwards apply my custom sorting.
> 
> How can I get from a search all document ids?
> Can this be done with ok performance?
> 
> I have been wondering if could do the sorting in lucene but I don't feel 
> comfortable at all because of lacking information / documentation.
> Also, the sorting should preferable be don Just in time, that is, the 
> underlying data for sorting changes constantly and I cant reindex as soon as 
> sorting data changes.
> 
> Any idea / suggestions?

I would look at implementing a custom comparator for the Sort instance in 
Lucene.  This requires implementing a FieldComparatorSource and a 
FieldComparator.  There are lots of examples in the Lucene code of this.  Note, 
the name FieldComparatorSource is a bit of a misnomer, as it doesn't have to be 
a Field (for instance, on SOLR-1297, I just implemented it to allow for sorts 
by Function Queries).  Naturally, getting this to perform with a database is 
going to be pretty tricky, but I think it will be way better than having to 
process all of the results a second time.  Having an effective caching strategy 
(similar to Lucene's FieldCache) will be important.

The other thing you could think about doing is loading a FieldCache with the 
ids (do it once when you load the IndexReader) and then use that  with a bit 
set telling you what documents matched.  

In either case, you are making a tradeoff with memory.

-Grant


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to