RE: Issue with Boosting Fields

ian.mcnaney Wed, 18 Oct 2006 14:14:41 -0700

Hi Gal,
  It looks like you're correct about the scoring filters overriding the
document boost (see the bottom of org.apache.nutch.indexer.Indexer's
reduce method).  That's not something I've dealt with before, so I'm
glad you mentioned it.  I learn something new every day.


  Keep in mind that each field in a document also has a boost, and when
that document is written to the index the field level boost and the
document level boost are combined into the field norms.  See Lucene
1.9.1 org.apache.lucene.index.DocumentWriter's addDocument(String
segment, Document doc) method.  The algorithm for boost handling as a
document gets written to the index is essentially as follows.
For field in fields:
  float norms = document.getBoost() * field.getBoost() * someOtherStuff
  storeToIndex(mindBendingEncoding(norms))
  
  Ultimately the value stored in the field norm is a combination of a
lot of things, and you can't determine what the original document and
field boosts were from this value alone.  That's why Lucene's
IndexReader/Searcher and Luke (which are unaware of Nutch's stored boost
field) don't even bother trying to recover them.  When you retrieve a
document from an index the boosts of all of the fields, and the boost of
the document itself, are set to the default of 1.0.  The boosts aren't
gone, they're still there working away on the ordering of results, but
you just can't retrieve them individually.

Best,
Ian

-----Original Message-----
From: Gal Nitzan [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 18, 2006 11:12 AM
To: [email protected]
Subject: RE: Issue with Boosting Fields



If I remember correctly the scoring plug-in overrides the boost value
that
was set in the index filter. The wiki sample is too old :(. You need to
write your own scoring plug-in (which is very simple BTW).

Gal.


-----Original Message-----
From: Paul Ramirez [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 17, 2006 10:31 PM
To: [email protected]
Subject: Issue with Boosting Fields

Hi All,

Recently, I have downloaded and have been using Nutch. I am making a
custom
application that has required me to write a parse and index plugin.
After
getting these to work I decided I wanted to boost some of the fields I
was
indexing so I followed the Wiki example
(http://wiki.apache.org/nutch/WritingPluginExample) that talks about
creating an Indexer extension that adjusts the boosted value of a field.
However, following this example I do a crawl and the fields do not show
up
as boosted in the index. I am using Luke to examine the index produced
by
Nutch. Instead, all fields have the default boost of 1.0. Is there some
configuration that overrides the IndexerPlugin setting the boost on a
field
level basis?

Thanks,
Paul Ramirez

RE: Issue with Boosting Fields

Reply via email to