Re: Announcement: Lucene powering Monster job search index (Beta)

no spam Tue, 30 Jan 2007 16:51:09 -0800

This is very similar to what I do.  I use a hit collector to gather the
results, then filter outside a bounding box, then calculate the euclidian
distance.


Last time I tried to check your search it was down.  We were talking the
other day at work how job search was lacking among the big boards.  I'm
excited to check out your new page.

Mark

On 1/28/07, Peter Keegan <[EMAIL PROTECTED]> wrote:

Correction:
We only do the euclidan computation during sorting. For filtering, a
simple
bounding box is computed to approximate the radius, and 2 range
comparisons
are made to exclude documents. Because these comparisons are done outside
of
Lucene as integer comparisons, it is pretty fast. With 13000 results, the
seach time with distance sort is about 200 msec (compared to 30 ms for a
simple non-radius, date-sorted keyword search).

Peter

On 1/27/07, no spam <[EMAIL PROTECTED]> wrote:
>
> Isn't this extremely ineffecient to do the euclidean distance twice?
> Perhaps not a huge deal if a small search result set.  I at times have
> 13,000 results that match my search terms of an index with 1.2 million
> docs.
>
> Can't you do some simple radian math first to ensure it's way out of
> bounds,
> then do the euclidian distance for the subset within bounds?  I'm
> currently
> only doing the distance calc once (post hit collector). I don't have any
> performance numbers with the double vs single distance calc.
>
> I'm still working out the sort by radius myself.
>
> Mark
>
> On 11/3/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> >
> > Daniel,
> > Yes, this is correct if you happen to be doing a radius search and
> sorting
> > by mileage.
> > Peter
> >
> >
>
>

Re: Announcement: Lucene powering Monster job search index (Beta)

Reply via email to