Re: comparing lucene scores across queries

Patrick Diviacco Tue, 29 Mar 2011 01:58:44 -0700

hey Uwe, so from your last answer, I understand I'm done.. no need to do
anything, I can already compare the queries.


However there is actually a misunderstanding: my booleanqueries have
variable number of boolean clauses because the fields are fixed but the
terms per field are not. So, for example, I have:

BooleanQuery1:
field1:term, SHOULD
field1:term, SHOULD
field2:term, SHOULD
field2:term, SHOULD
field2:term, SHOULD
field3:term, SHOULD

BooleanQuery2:
field1:term, SHOULD
field2:term, SHOULD
field3:term, SHOULD
field3:term, SHOULD
field3:term, SHOULD
field3:term, SHOULD
field3:term, SHOULD

Is any of the points we discussed so far not anymore valid ?

thanks

On 29 March 2011 10:48, Uwe Schindler <u...@thetaphi.de> wrote:

> > thanks for your reply. I thought I've solved the issue according to Uwe,
> the
> > queries without coord function were reasonably comparable, but now you
> > actually reopened it.
> >
> > So, I need to be sure I'm making them comparable and I would like to ask
> the
> > following.
> >
> > My BooleanQueries have similar structure. Important: they only contain
> > TermQueries. The fields are always 3 but the terms number can vary...
> this
> is
> > an example of BooleanQuery (sorry for the syntax):
> >
> > field1:term1, SHOULD
> > field1:term2, SHOULD
> > field2:term1, SHOULD
> > field2:term2, SHOULD
> > field2:term3, SHOULD
> > field3:term1, SHOULD
> > ...
> >
> > If it is not clear how the BooleanQueries are, I can print some of them
> for
> > you. They have same number of fields but different number of terms.
> >
> > 1- Do you still think QueryNorm is not an issue ? Funny, because in the
> > documentation I can read:
> > QueryNorm(q) is a normalizing factor used to make scores between queries
> > comparable. This factor does not affect document ranking (since all
> ranked
> > documents are multiplied by the same factor), but rather just attempts to
> > make scores from different queries (or even different indexes)
> comparable.
> >
> > It seems I can compare queries from the documentation.
>
> But as you are always using the same type of query (TermQuery), the
> QueryNorm should not change, so no issue at all. It differs if you have a
> variable number of Boolean clauses, the Query norm could help you to make
> the queries comparable. But if you only have always the same looking BQ
> with
> exact same number of TQ in it (only different terms) its not an issue at
> all. In all other cases, the query norm helps to compare e.g. a BQ with 5
> TQ
> clauses with another BQ that has 8 TQ clauses.
>
> > 2- I don't think I'm using queryBoosts, are they enabled by default in
> the
> > BooleanQuery ?
>
> Query boost are only active if you do TermQuery.setBoost(anything != 1.0f).
>
> > 3- FieldNorm is not mentioned in Similarity class. How can I disable it ?
> > SHould I disable it ? Is it a issue ?
>
> FieldNorm should not be a problem, as it's an indexed feature. So the same
> document has always the same FieldNorm (which is a combination of length
> norm, indexing document boost). If two queries hit the same document the
> scores for this document should be comparable, as the FieldNorm is the same
> for both cases.
>
> See point 6) in the Similarity docs: norm(t,d)
>
> > 4-  If I'm not wrong Uwe told me I can compute comparable cosine
> similarities
> > even with documents of different length. Tf and Idf are unbounded, and my
> > docs have different length. Can't I measure the similarity between query
> and
> > doc vectors anyway ?
>
> The field norm normalizes that. So where is the problem?
>
> > 5 - Again, I've been told I can compare queries and from documentation, I
> > can see that queryNorm factor normalizes all queries. But you are saying
> I
> > should manually normalize them somehow ? It is not clear
>
> It only affects different querys (e.g. number of Boolean clauses differ,
> type of queries differ).
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: comparing lucene scores across queries

Reply via email to