Hi there,
Am working on web based Job search application using Lucene.User on my site
can search for jobs which are within a radius of 100 miles from say
"Boston,MA" or any other location.
Also, I need to show the search results sorted by "relevance"(ie. Score
returned by lucene) in descending order.
I'm using a 3rd party API to fetch all the cities within given radius of a
city.This API returns me around 864 cities within 100 miles radius of
"Boston,MA".
I'm building the city/state Lucene query using the following logic which is
part of my "BuildNearestCitiesQuery" method.
Here nearestCities is a hashtable returned by the above API.It contains 864
cities with CityName ass key and StateCode as value.
And finalQuery is a Lucene BooleanQuery object which contains other search
criteria entered by the user like:skills,keywords,etc.
/*code*/
foreach (string city in nearestCities.Keys)
{
BooleanQuery tempFinalQuery = finalQuery;
cityStateQuery = new BooleanQuery();
queryCity = queryParserCity.Parse(city);
queryState = queryParserState.Parse(((string[])nearestCities[city])[1]);
cityStateQuery.Add(queryCity, BooleanClause.Occur.MUST); //must is like
an AND
cityStateQuery.Add(queryState, BooleanClause.Occur.MUST);
}
nearestCityQuery.Add(cityStateQuery, BooleanClause.Occur.SHOULD); //should
is like an OR
finalQuery.Add(nearestCityQuery, BooleanClause.Occur.MUST);
/*code*/
I then input finalQuery object to Lucene's Search method to get all the jobs
within 100 miles radius.:
searcher.Search(finalQuery, collector);
I found out this BuildNearestCitiesQuery method takes a whopping 29 seconds
on an average to execute which obviously is unacceptable by any standards of
a website.I also found out that the statements involving "Parse" take a
considerable amount of time to execute as compared to other statements.
A job for a given location is a dynamic attribute in the sense that a city
could have 2 jobs(meeting a particular search criteria) today,but zero job
for the same search criteria after 3 days.So,I cannot use any "Caching" over
here.
Is there any way I can optimize this logic?or for that matter my whole
approach/algorithm towards finding all jobs within 100 miles using Lucene?
FYI,here is how my indexing in Lucene looks like:
doc.Add(new Field("jobId", job.JobID.ToString().Trim(), Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.Add(new Field("title", job.JobTitle.Trim(), Field.Store.YES,
Field.Index.TOKENIZED));
doc.Add(new Field("description", job.JobDescription.Trim(), Field.Store.NO,
Field.Index.TOKENIZED));
doc.Add(new Field("city", job.City.Trim(), Field.Store.YES,
Field.Index.TOKENIZED , Field.TermVector.YES));
doc.Add(new Field("state", job.StateCode.Trim(), Field.Store.YES,
Field.Index.TOKENIZED, Field.TermVector.YES));
doc.Add(new Field("citystate", job.City.Trim() + ", " +
job.StateCode.Trim(), Field.Store.YES, Field.Index.UN_TOKENIZED ,
Field.TermVector.YES));
doc.Add(new Field("datePosted", jobPostedDateTime, Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.Add(new Field("company", job.HiringCoName.Trim(), Field.Store.YES,
Field.Index.TOKENIZED));
doc.Add(new Field("jobType", job.JobTypeID.ToString(), Field.Store.NO,
Field.Index.UN_TOKENIZED,Field.TermVector.YES));
doc.Add(new Field("sector", job.SectorID.ToString(), Field.Store.NO,
Field.Index.UN_TOKENIZED, Field.TermVector.YES));
doc.Add(new Field("showAllJobs", "yy", Field.Store.NO,
Field.Index.UN_TOKENIZED));
Thanks a ton for reading!I would really appreciate your help on this.
Janis
--
View this message in context:
http://www.nabble.com/Need-help-on-a-Lucene-problem-tp21248342p21248342.html
Sent from the Lucene - General mailing list archive at Nabble.com.