On Sun, Sep 20, 2015 at 6:52 PM, Ahmet Arslan <[email protected]> wrote:
> Hi Robert,
>
> As I understand, with SynonymQuery, all expansion is recommended to be
> performed on query time only,
> and SynonymQuery will take care of the below problem :
Its not that I recommend query-time expansion (vs index-time), its
just that lucene needed to deal with that option a little better than
before.
>
> "A query for text:TV will expand into (text:TV text:Television) and the lower
> docFreq for text:Television will give the documents that match "Television" a
> much higher score then docs that match "TV" comparably -- which may be
> somewhat counter intuitive to the client. Index time expansion (or reduction)
> will result in the same idf for all documents regardless of which term the
> original text contained."
That is correct. Additionally if a document contains one instance of
TV and one instance of Television, the two term frequencies are added
up, it is treated as a single term for the document having tf=2, and
then sent to the similarity like that. So it tries to behave as if TV
and Television were one index term. This is important so that the term
frequency normalization is applied correctly, to represent the
information gain of additional occurrence.
>
> At the end of the query analysis, if there are tokens at the same position, I
> need to create my SynonymQuery programmatically, right?
QueryBuilder (used by queryparsers) will generate SynonymQuery when it
sees the posInc=0 situation from the tokenstream.
>
> Let me explain my concern with another example:
>
> <analyzer>
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
> </analyzer>
>
>
> With above analyzer, the query "foo bör" will boost the term "bör" for no
> reason.
> Just because bör will be expanded into two terms : bor and bör.
> Its contribution to total score is counted two times. I think this is very
> trappy.
>
> With SynonymQuery solution, I will index with StandardTokenizer only.
> No expansion at index time.
> I will construct the query : new TermQuery('foo') + new SynonymQuery('bor',
> 'bör');
Yes, that is exactly it.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]