[trimming the post a bit]

On Sep 18, 2005, at 11:51 AM, James Huang wrote:
The problem is quite generic, I believe. What I like
to do is similar to LIA-ch6, i.e. to find a "good
Chinese Hunan-style restaurant near me." I prefer
Hunan-style; however, if a good Human-style one is 12
miles, where there is a Shanghai-style only 2 miles, I
may want to take that instead. So it's not a simple
multi-sorting problem, it's an empirical ordering and
the parameters may have to be experimented. Thus far,
I'm happy with that formula I gave earlier.

The example in LIA was purely a distance sort, not blended as you desire.

Separately, earlier in this thread, you also mentioned
"what if 10M search results?" -- that's also my
concern, for both space and time.

1. Space-wise, the 10M Document's will be dragged into
memory (in a Hits, say), right?

No, that is not correct, and this is an important point about Lucene and it's ability to scale extremely well. Hits caches up to 200 documents (I believe) and uses a mechanism to score single documents at a time and only keep the top scoring ones.

There is no problem for Lucene to search and have Hits with a massive size.

There are memory considerations with sorting, though - these are described in detail in the javadocs and a little in LIA.

1. How to use a compound scoring at search-time (where
you suggested a Query-subclass, but what/how?)

I'm going to defer to others to assist with this, or validate that this is the right approach in this situation.

2. Space concern about large search result set.

With a Query subclass, this shouldn't be a concern. With sorting using Lucene's Sort there are some memory concerns, but less so than with your own TreeSet.

P.S. Feel free to reply to the list, if you think this
has general appeal and others may benefit.

Done!

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to