Re: idf on per-field basis

Artem Chereisky Thu, 10 Dec 2009 14:04:00 -0800

Michael,

QueryFilter is certainly the way to go for fields that don't require
scoring. Thanks for that.


Everyone,

Regarding making modifications to Lucene core and/or extending Lucene's
classes, what's the best practice for managing the changes?
I keep a Lucene repository under TortoiseSVN pointing to
https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_4_0.
Then every time I make a core change or extend a Lucene class, I copy the
files involved into a separate folder structure which is part of another SVN
repository. That way I can source control my changes. The process is a bit
cumbersome. Is there a better way?

Regards,
Artem



On Fri, Dec 11, 2009 at 5:09 AM, Michael Garski <mgar...@myspace-inc.com>wrote:

> Artem,
>
> I've made modifications to the internals of Lucene.Net to achieve
> modifications to scoring, specifically in being able to manually specify the
> length norm for a field, which allowed me to retain positional information
> while injecting multi-term synonyms, so I wouldn't worry too much about
> making a special build for yourself with a few changes.
>
> Would using a QueryFilter in conjunction with a query work?  The
> QueryFilter would be used on fields that scoring information was not
> necessary while the other fields would be queried with the specific query
> you need.
>
> Michael
>
> -----Original Message-----
> From: Artem Chereisky [mailto:a.cherei...@gmail.com]
> Sent: Thursday, December 10, 2009 1:40 AM
> To: lucene-net-user@incubator.apache.org
> Cc: <lucene-net-user@incubator.apache.org>
> Subject: Re: idf on per-field basis
>
> Michael, thank you.
>
> Query filter only solves half of my problem. Unfortunately I do need
> to have a proper score for some fields.
>
> I ended up extending Term class (I removed sealed attribute which is a
> bad thing). The new myTerm class has one boolean member, omitIdf.
> Then, when I compile my queries, I use myTerm with omitIdf set to
> true, for some fields. Then I extended Similarity cladd and I cast
> Term passes into Idf method to myTerm and only calculate Idf if
> omitIdf is true. Seems to work.
>
> I don't like the solution but that's the best I could do today.
>
> Any thoughts?
>
> Regards,
> Artem
>
>
> On 10/12/2009, at 15:51, Michael Garski <mgar...@myspace-inc.com> wrote:
>
> > Artem,
> >
> > Do you need any scoring information at all on that field?  How about
> > using a QueryFilter for those fields?
> >
> > Michael
> >
> >
> > -----Original Message-----
> > From: Artem Chereisky [mailto:a.cherei...@gmail.com]
> > Sent: Wed 12/9/2009 4:53 PM
> > To: lucene-net-user@incubator.apache.org;
> lucene-net-develo...@incubator.apache.org
> > Subject: idf on per-field basis
> >
> > Hi,
> >
> > I came across a situation when my scores are adversely affected by
> > the IDF
> > component. Let me explain.
> >
> > My index documents contain a number of fields, for some, TF and IDF
> > are
> > important and need to be taken into account, for others niether TF
> > nor IDF
> > should apply. I dealt with TF by omiting norms during indexing but I
> > can't
> > find a way to calculate IDF for certain fields only.
> >
> > The formula for IDF is defined in Similarity. I have my own
> > implementation
> > of Similarity where I can set it to 1 or use the default
> > implementation.
> > mySearcher.SetSimilarity is where I tell Lucene which similarity
> > instance to
> > use, but that's global, so it applies to all fields in the index.
> >
> > So, here's my question. Is there a way to calculate IDF on per-field
> > basis?
> >
> > Regards,
> > Art
> >
> >
>
>

Re: idf on per-field basis

Reply via email to