Why can't it be just a Solr search against the collection of titles (and counts)? It will manage all the approximate matches and ranking for you, you just need to tell it the rules (e.g. with mm parameter).
Regards, Alex. ---- Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 22 January 2015 at 22:53, thakkar.aayush <[email protected]> wrote: > I have master list of known job titles and associated job counts. > I am looking for ways to extract the same from the searched term. For > example: > > Searched job title: Senior Digital Marketing Specialist > Extracted to: Senior Digital Marketing > > Searched job title: Retail In-Store Sales Assistant; Full Time > Extracted to: Retail Sales Assistant > > For boosting up the term search I have planned to tokenize the searched job > title and calculate: > 1) The occurrence of the tokens taken 2 at a time and matching it with > master list of job titles. > 2) The occurrence of token taken 1 at time and finding the job count. > > Please suggest ideas for cleaning or extracting the job title. Should I look > for some other parameters as well for better results. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Normailzing-Job-Title-tp4181388.html > Sent from the Lucene - Java Developer mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
