[ https://issues.apache.org/jira/browse/LUCENE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Preetam Rao updated LUCENE-1853: -------------------------------- Remaining Estimate: (was: 336h) Original Estimate: (was: 336h) > SubPhraseQuery for matching and scoring sub phrase matches. > ----------------------------------------------------------- > > Key: LUCENE-1853 > URL: https://issues.apache.org/jira/browse/LUCENE-1853 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Environment: Lucene/Java > Reporter: Preetam Rao > Priority: Minor > Attachments: LUCENE-1853.patch, LUCENE-1853.patch > > > The goal is to give more control via configuration when searching using user > entered queries against multiple fields where sub phrases have special > significance. > For a query like "homes in new york with swimming pool", if a document's > field matches only "new york" it should get scored and it should get scored > higher than two separate matches "new" and "york". Also, a 3 word sub phrase > match must gets scored considerably higher than a 2 word sub phrase match. > (boost factor should be configurable) > Using shingles for this use case, means each field of each document needs to > be indexed as shingles of all (1..N)-grams as well as the query. (Please > correct me if I am wrong.) > The query could also support > - ignoring of idf and/or field norms, (so that factors outside the document > don't influence scoring) > - consider only the longest match (for example match on "new york" is scored > and considered rather than "new" furniture and "york" city) > - ignore duplicates ("new york" appearing twice or thrice does not make any > difference) > This kind of query could be combined with DisMax query. For example, > something like solr's dismax request handler can be made to use this query > where we run a user query as it is against all fields and configure each > field with above configurations. > I have also attached a patch with comments and test cases in case, my > description is not clear enough. Would appreciate alternatives or feedback. > Example Usage: > <code> > // sub phrase config > SubPhraseQuery.SubPhraseConfig conf = new > SubPhraseQuery.SubPhraseConfig(); > conf.ignoreIdf = true; > conf.ignoreFieldNorms = true; > conf.matchOnlyLongest = true; > conf.ignoreDuplicates = true; > conf.phraseBoost = 2; > // phrase query as usual > SubPhraseQuery pq = new SubPhraseQuery(); > pq.add(new Term("f", term)); > pq.add(new Term("f", term)); > pq.setSubPhraseConf(conf); > Hits hits = searcher.search(pq); > </code> -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org