janis wrote:

Is there any way I can optimize this logic?or for that matter my whole
approach/algorithm towards finding all jobs within 100 miles using Lucene?
Hi.
I don't know how Lucene works per se.
But think about what you are really doing with your logic :
You are telling the search engine to

- look (in the whole database) for all items which have city = city-1, and keep a list of these item numbers - look for all items which have city = city-2, and keep a list of these item numbers
...
- look for all items which have city = city-864, and keep a list of these item numbers

- now combine all the item numbers above, and return a list of the unique item numbers among them

- look for all the items that have state = state-1, and keep a list..
- look for all ... state-2, and keep a list...
...
- now combine all these items and return a list of the unique item numbers among them

- now combine the list from the cities, with the list from the states, and return a list of all unique item numbers among them

- look for all items which have skill = skill-1, and keep a list
...
... etc..

If your database contains 1,000,000 job items, no wonder it is taking 29 seconds.

You would be much better off doing a first query, using first the criteria that are the most restrictive (aka will probably give the fewest hits), then applying another query to that result set and get another smaller set, then apply another query to that set to restrict it even further, etc..

Another aspect is that search engines like Lucene are the right tool to use when you are searching words which occur in a text, in relative position to eachother, and/or after stemming etc.. But they are not necessarily the best tool to use when you are looking for a strict (aka "stupid") string comparison, such as ' city == "New York" ', where the city name is in a field of its own and is in a fixed (predictable) form. (I mean that to search "New York" you can just compare the string "New York" and you do not have to do a query like "the word New next to the word York"). For example, since you already have your 864 city names in a table, in a known form, and since your items all have a field "city" in a known form, you could use Lucene to do the query excluding the city, get the list of results in an array, and then do a simple scan of your array in Java, keeping only the items that match one of your cities of choice (string comparison). The same for the State. With 10,000 results and 864 cities, using perl this would probably take less than a second. Your mileage with Java may vary.

Reply via email to