Re: Need help on a Lucene problem

André Warnier Fri, 02 Jan 2009 00:42:56 -0800

janis wrote:


Is there any way I can optimize this logic?or for that matter my whole
approach/algorithm towards finding all jobs within 100 miles using Lucene?

Hi.
I don't know how Lucene works per se.
But think about what you are really doing with your logic :
You are telling the search engine to

- look (in the whole database) for all items which have city = city-1,and keep a list of these item numbers- look for all items which have city = city-2, and keep a list of theseitem numbers

...

- look for all items which have city = city-864, and keep a list ofthese item numbers

- now combine all the item numbers above, and return a list of theunique item numbers among them


- look for all the items that have state = state-1, and keep a list..
- look for all ... state-2, and keep a list...
...

- now combine all these items and return a list of the unique itemnumbers among them

- now combine the list from the cities, with the list from the states,and return a list of all unique item numbers among them


- look for all items which have skill = skill-1, and keep a list
...
... etc..

If your database contains 1,000,000 job items, no wonder it is taking 29seconds.

You would be much better off doing a first query, using first thecriteria that are the most restrictive (aka will probably give thefewest hits), then applying another query to that result set and getanother smaller set, then apply another query to that set to restrict iteven further, etc..

Another aspect is that search engines like Lucene are the right tool touse when you are searching words which occur in a text, in relativeposition to eachother, and/or after stemming etc..But they are not necessarily the best tool to use when you are lookingfor a strict (aka "stupid") string comparison, such as ' city == "NewYork" ', where the city name is in a field of its own and is in a fixed(predictable) form. (I mean that to search "New York" you can justcompare the string "New York" and you do not have to do a query like"the word New next to the word York").For example, since you already have your 864 city names in a table, in aknown form, and since your items all have a field "city" in a knownform, you could use Lucene to do the query excluding the city, get thelist of results in an array, and then do a simple scan of your array inJava, keeping only the items that match one of your cities of choice(string comparison). The same for the State.With 10,000 results and 864 cities, using perl this would probably takeless than a second. Your mileage with Java may vary.

Re: Need help on a Lucene problem

Reply via email to