Hi Ahmet, maybe have a look at the SynonymQuery added in
https://issues.apache.org/jira/browse/LUCENE-6789

For query-time synonyms, it just tries to approximate what happens if
you instead do this work at index-time, by creating a "pseudo-term"
(disjunction of all terms at that same position) summing up the term
frequency across all matching terms before passing to score(). For the
statistics side it takes the maximum DF as the representative DF, and
the sum of the TTF as the representative TTF.

I did relevance experiments with this and the results were positive
over the existing query generated (BooleanQuery with coord disabled),
especially for scoring systems that don't do anything with coord.


On Sun, Sep 20, 2015 at 1:56 PM, Ahmet Arslan <[email protected]> wrote:
> Hello,
>
> Assume that term t1 is expanded into multiple terms (at the same position) 
> during both indexing and query time.
> This is possible with KeywordRepeat, SynonymFilter, or the Filters that have 
> preserveOriginal option for instance.
>
> When a two-term query (t1 t2) is executed, term t1 is boosted artificially.
> Score contribution of the term t1 is counted multiple times.
> It is like the query were issued with boosts : t1^3 t2
> This behaviour boosts expanded terms and may not be always desired.
> E.g. (When t2 is a content-bearing word)
>
> I think there should be a flag/switch which is analogous to relationship 
> between discountOverlaps & document's length.
> With this control, overlapping query terms' (tokens with a position of 
> increment of zero) scores are counted once.
> Remaining terms (not overlapping ones) are not affected.
>
> Bruno asked for this functionality in the past : 
> http://find.searchhub.org/document/bb99e435ba35f2b1
>
> What do you think about this? How difficult to implement this?
> Would this be a Lucene or Solr issue?
>
> Thanks,
> Ahmet
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to