[trimming the post a bit] On Sep 18, 2005, at 11:51 AM, James Huang wrote:
The problem is quite generic, I believe. What I like to do is similar to LIA-ch6, i.e. to find a "good Chinese Hunan-style restaurant near me." I prefer Hunan-style; however, if a good Human-style one is 12 miles, where there is a Shanghai-style only 2 miles, I may want to take that instead. So it's not a simple multi-sorting problem, it's an empirical ordering and the parameters may have to be experimented. Thus far, I'm happy with that formula I gave earlier.
The example in LIA was purely a distance sort, not blended as you desire.
Separately, earlier in this thread, you also mentioned "what if 10M search results?" -- that's also my concern, for both space and time. 1. Space-wise, the 10M Document's will be dragged into memory (in a Hits, say), right?
No, that is not correct, and this is an important point about Lucene and it's ability to scale extremely well. Hits caches up to 200 documents (I believe) and uses a mechanism to score single documents at a time and only keep the top scoring ones.
There is no problem for Lucene to search and have Hits with a massive size.
There are memory considerations with sorting, though - these are described in detail in the javadocs and a little in LIA.
1. How to use a compound scoring at search-time (where you suggested a Query-subclass, but what/how?)
I'm going to defer to others to assist with this, or validate that this is the right approach in this situation.
2. Space concern about large search result set.
With a Query subclass, this shouldn't be a concern. With sorting using Lucene's Sort there are some memory concerns, but less so than with your own TreeSet.
P.S. Feel free to reply to the list, if you think this has general appeal and others may benefit.
Done! Erik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]