RE: idf on per-field basis

Moray McConnachie Fri, 11 Dec 2009 01:21:58 -0800

Michael wrote:

"so I wouldn't worry too much about making a special build for yourself
with a few changes."

We did this to fix a couple of bugs and add some functionality around
sorting a few versions back - it was absolutely fine, but depending on
how much time you have to spend on Lucene, it can be a bit of a pain for
maintainability, depending on how much is changing in that area of the
Lucene code base with subsequent releases. 

Yours,
MOray

------------------------------------- 
Moray McConnachie
Director of IT    +44 1865 261 600
Oxford Analytica  http://www.oxan.com

-----Original Message-----
From: Michael Garski [mailto:mgar...@myspace-inc.com] 
Sent: 10 December 2009 18:10
To: lucene-net-user@incubator.apache.org
Subject: RE: idf on per-field basis

Artem,

I've made modifications to the internals of Lucene.Net to achieve
modifications to scoring, specifically in being able to manually specify
the length norm for a field, which allowed me to retain positional
information while injecting multi-term synonyms, so I wouldn't worry too
much about making a special build for yourself with a few changes.

Would using a QueryFilter in conjunction with a query work?  The
QueryFilter would be used on fields that scoring information was not
necessary while the other fields would be queried with the specific
query you need.

Michael

-----Original Message-----
From: Artem Chereisky [mailto:a.cherei...@gmail.com]
Sent: Thursday, December 10, 2009 1:40 AM
To: lucene-net-user@incubator.apache.org
Cc: <lucene-net-user@incubator.apache.org>
Subject: Re: idf on per-field basis

Michael, thank you.

Query filter only solves half of my problem. Unfortunately I do need to
have a proper score for some fields.

I ended up extending Term class (I removed sealed attribute which is a
bad thing). The new myTerm class has one boolean member, omitIdf.  
Then, when I compile my queries, I use myTerm with omitIdf set to true,
for some fields. Then I extended Similarity cladd and I cast Term passes
into Idf method to myTerm and only calculate Idf if omitIdf is true.
Seems to work.

I don't like the solution but that's the best I could do today.

Any thoughts?

Regards,
Artem

On 10/12/2009, at 15:51, Michael Garski <mgar...@myspace-inc.com> wrote:

> Artem,
>
> Do you need any scoring information at all on that field?  How about 
> using a QueryFilter for those fields?
>
> Michael
>
>
> -----Original Message-----
> From: Artem Chereisky [mailto:a.cherei...@gmail.com]
> Sent: Wed 12/9/2009 4:53 PM
> To: lucene-net-user@incubator.apache.org; 
> lucene-net-develo...@incubator.apache.org
> Subject: idf on per-field basis
>
> Hi,
>
> I came across a situation when my scores are adversely affected by the

> IDF component. Let me explain.
>
> My index documents contain a number of fields, for some, TF and IDF 
> are important and need to be taken into account, for others niether TF

> nor IDF should apply. I dealt with TF by omiting norms during indexing

> but I can't find a way to calculate IDF for certain fields only.
>
> The formula for IDF is defined in Similarity. I have my own 
> implementation of Similarity where I can set it to 1 or use the 
> default implementation.
> mySearcher.SetSimilarity is where I tell Lucene which similarity 
> instance to use, but that's global, so it applies to all fields in the

> index.
>
> So, here's my question. Is there a way to calculate IDF on per-field 
> basis?
>
> Regards,
> Art
>
>

RE: idf on per-field basis

Reply via email to