Re: Sort by relevance+distance

Erik Hatcher Sun, 18 Sep 2005 09:05:19 -0700

[trimming the post a bit]

On Sep 18, 2005, at 11:51 AM, James Huang wrote:

The problem is quite generic, I believe. What I like
to do is similar to LIA-ch6, i.e. to find a "good
Chinese Hunan-style restaurant near me." I prefer
Hunan-style; however, if a good Human-style one is 12
miles, where there is a Shanghai-style only 2 miles, I
may want to take that instead. So it's not a simple
multi-sorting problem, it's an empirical ordering and
the parameters may have to be experimented. Thus far,
I'm happy with that formula I gave earlier.

The example in LIA was purely a distance sort, not blended as youdesire.

Separately, earlier in this thread, you also mentioned
"what if 10M search results?" -- that's also my
concern, for both space and time.

1. Space-wise, the 10M Document's will be dragged into
memory (in a Hits, say), right?

No, that is not correct, and this is an important point about Luceneand it's ability to scale extremely well. Hits caches up to 200documents (I believe) and uses a mechanism to score single documentsat a time and only keep the top scoring ones.

There is no problem for Lucene to search and have Hits with a massivesize.

There are memory considerations with sorting, though - these aredescribed in detail in the javadocs and a little in LIA.

1. How to use a compound scoring at search-time (where
you suggested a Query-subclass, but what/how?)

I'm going to defer to others to assist with this, or validate thatthis is the right approach in this situation.

2. Space concern about large search result set.

With a Query subclass, this shouldn't be a concern. With sortingusing Lucene's Sort there are some memory concerns, but less so thanwith your own TreeSet.

P.S. Feel free to reply to the list, if you think this
has general appeal and others may benefit.


Done!

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Sort by relevance+distance

Reply via email to