Need help on a Lucene problem

janis Thu, 01 Jan 2009 21:57:13 -0800

Hi there,
 
Am working on web based Job search application using Lucene.User on my site
can search for jobs which are  within a radius of 100 miles from say
"Boston,MA" or any other location.
Also, I need to show the search results sorted by "relevance"(ie. Score
returned by lucene) in descending order.


I'm using  a 3rd party API to fetch all the cities within given radius of a
city.This API returns me around 864 cities within 100 miles radius of
"Boston,MA".

I'm building the city/state Lucene query using the following logic which is
part of my "BuildNearestCitiesQuery" method.
Here nearestCities is a hashtable returned by the above API.It contains 864
cities with CityName ass key and StateCode as value. 
And finalQuery is a Lucene BooleanQuery object which contains other search
criteria entered by the user like:skills,keywords,etc.

/*code*/            
foreach (string city in nearestCities.Keys)

{

    BooleanQuery tempFinalQuery = finalQuery;

    cityStateQuery = new BooleanQuery();    

    queryCity = queryParserCity.Parse(city);

    queryState = queryParserState.Parse(((string[])nearestCities[city])[1]);

    cityStateQuery.Add(queryCity, BooleanClause.Occur.MUST); //must is like
an AND

    cityStateQuery.Add(queryState, BooleanClause.Occur.MUST);

} 


nearestCityQuery.Add(cityStateQuery, BooleanClause.Occur.SHOULD); //should
is like an OR

 

finalQuery.Add(nearestCityQuery, BooleanClause.Occur.MUST);

 

/*code*/

 


I then input finalQuery object to Lucene's Search method to get all the jobs
within 100 miles radius.:

searcher.Search(finalQuery, collector);

 

I found out this BuildNearestCitiesQuery method takes a whopping 29 seconds
on an average to execute which obviously is unacceptable by any standards of
a website.I also found out that the statements involving "Parse" take a
considerable amount of time to execute as compared to other statements.
 
A job for a given location is a dynamic attribute in the sense that a city
could have 2 jobs(meeting a particular search criteria) today,but zero job
for the same search criteria after 3 days.So,I cannot use any "Caching" over
here.

Is there any way I can optimize this logic?or for that matter my whole
approach/algorithm towards finding all jobs within 100 miles using Lucene?
 

FYI,here is how my indexing in Lucene looks like:

 

doc.Add(new Field("jobId", job.JobID.ToString().Trim(), Field.Store.YES,
Field.Index.UN_TOKENIZED));

doc.Add(new Field("title", job.JobTitle.Trim(), Field.Store.YES,
Field.Index.TOKENIZED));

doc.Add(new Field("description", job.JobDescription.Trim(), Field.Store.NO,
Field.Index.TOKENIZED));

doc.Add(new Field("city", job.City.Trim(), Field.Store.YES,
Field.Index.TOKENIZED , Field.TermVector.YES));

doc.Add(new Field("state", job.StateCode.Trim(), Field.Store.YES,
Field.Index.TOKENIZED, Field.TermVector.YES));

doc.Add(new Field("citystate", job.City.Trim() + ", " +
job.StateCode.Trim(), Field.Store.YES, Field.Index.UN_TOKENIZED ,
Field.TermVector.YES));

doc.Add(new Field("datePosted", jobPostedDateTime, Field.Store.YES,
Field.Index.UN_TOKENIZED));

doc.Add(new Field("company", job.HiringCoName.Trim(), Field.Store.YES,
Field.Index.TOKENIZED));

doc.Add(new Field("jobType", job.JobTypeID.ToString(), Field.Store.NO,
Field.Index.UN_TOKENIZED,Field.TermVector.YES));

doc.Add(new Field("sector", job.SectorID.ToString(), Field.Store.NO,
Field.Index.UN_TOKENIZED, Field.TermVector.YES));

doc.Add(new Field("showAllJobs", "yy", Field.Store.NO,
Field.Index.UN_TOKENIZED));


Thanks a ton for reading!I would really appreciate your help on this.
 
 
Janis
-- 
View this message in context: 
http://www.nabble.com/Need-help-on-a-Lucene-problem-tp21248342p21248342.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Need help on a Lucene problem

Reply via email to