[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

Hoss Man (JIRA) Tue, 21 Jun 2011 12:16:10 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052771#comment-13052771
 ]


Hoss Man commented on LUCENE-3130:
----------------------------------

bq. A QP can already solve this issue today, simply by boosting down terms with 
positionIncrement = 0.

That assumes:
a) that every TokenFilter which might inject terms like this will always put 
the most important one first
b) that the amount of boost should be fixed

what i'm suggesting is that we make this more flexible so that people wiring 
together their apps and analyzers have an easy way to guide the queryParsers 
behavior.  if we have allow a well defined attribute for this people can have 
custom analysis that specify arbitrary boosts in cases we may not be able to 
specificly anticipate. (synonyms, entity recognition, common word demoting, 
etc..)

bq. But I really think the implementation details of QP should remain in QP, 
the analysis chain should instead be general and describe up the text.

why don't you consider an attribute that denotes "this term is worth less then 
a typical term" a general description of the text?

bq. Otherwise, things get really confusing, e.g. what should a ShingleFilter do 
when it combines two tokens that have different BoostAttributes?

It does whatever it already does when it encounters two tokens that may have 
attributes it doesn't know about (ignore them when creating the new token, if i 
remember correctly).  Unrecognized attributes isn't a new problem.

bq. If you do what you describe, what if you then want to tweak the ranking for 
synonyms? You must reindex.

how is that any different from any other aspect of index time synonyms?  if you 
use them you *always* have to reindex when you change your synonyms.

I'm not arguing that index time synonyms is a good idea in general, i'm not 
arguing that this "we look for BoostAttributes on tokens" feature of the QP 
would be useful (or even a good idea for everyone).  I'm arguing that having 
such a feature would provide an easy way for people who are alreayd customizing 
their analysis to easily modify/influence the behavior of the query parser (w/o 
subclassing) that could still easily work in conjunction with other techniques.

> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

Reply via email to