PhraseQuery Scorer for scoring sub phrase matches
-------------------------------------------------
Key: LUCENE-1853
URL: https://issues.apache.org/jira/browse/LUCENE-1853
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Environment: Lucene/Java
Reporter: Preetam Rao
Priority: Minor
Fix For: 2.9
For a query like "homes in new york with swimming pool", if a document's field
matches only "new york" it should get scored and it should get scored higher
than two separate matches "new" and "york". Also, a 3 word sub phrase match
must gets scored considerably higher than a 2 word sub phrase match. (boost
factor should be configurable)
If a user query is taken as is without parsing and is searched against multiple
fields, where each sub-phrase can match against a different field, this kind of
query is useful.
Using shingles for this use case, means each field of each document needs to be
indexed as shingles of all (1..N)-grams as well as the query. (Please correct
me if I am wrong.)
The scorer could also support
- ignoring of idf and/or field norms, (so that factors outside the document
don't influence scoring)
- consider only the longest match (for example match on "new york" is scored
and considered rather than "new" furniture and "york" city)
- ignore duplicates ("new york" appearing twice or thrice does not make any
difference)
This kind of query (Phrase Query with SubPhraseScorer) could be combined with
DisMax query. For example, something like solr's dismax request handler can be
made to use this query where we run a user query as it is against all fields
and configure each field with above configurations.
I have also attached a patch with comments and test cases in case, my
description is not clear enough. Would appreciate alternatives or feedback. The
goal is to give more control via configuration when searching using user
entered queries against multiple fields where sub phrases have special
significance.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]