Hi Gal, It looks like you're correct about the scoring filters overriding the document boost (see the bottom of org.apache.nutch.indexer.Indexer's reduce method). That's not something I've dealt with before, so I'm glad you mentioned it. I learn something new every day.
Keep in mind that each field in a document also has a boost, and when that document is written to the index the field level boost and the document level boost are combined into the field norms. See Lucene 1.9.1 org.apache.lucene.index.DocumentWriter's addDocument(String segment, Document doc) method. The algorithm for boost handling as a document gets written to the index is essentially as follows. For field in fields: float norms = document.getBoost() * field.getBoost() * someOtherStuff storeToIndex(mindBendingEncoding(norms)) Ultimately the value stored in the field norm is a combination of a lot of things, and you can't determine what the original document and field boosts were from this value alone. That's why Lucene's IndexReader/Searcher and Luke (which are unaware of Nutch's stored boost field) don't even bother trying to recover them. When you retrieve a document from an index the boosts of all of the fields, and the boost of the document itself, are set to the default of 1.0. The boosts aren't gone, they're still there working away on the ordering of results, but you just can't retrieve them individually. Best, Ian -----Original Message----- From: Gal Nitzan [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 18, 2006 11:12 AM To: [email protected] Subject: RE: Issue with Boosting Fields If I remember correctly the scoring plug-in overrides the boost value that was set in the index filter. The wiki sample is too old :(. You need to write your own scoring plug-in (which is very simple BTW). Gal. -----Original Message----- From: Paul Ramirez [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 17, 2006 10:31 PM To: [email protected] Subject: Issue with Boosting Fields Hi All, Recently, I have downloaded and have been using Nutch. I am making a custom application that has required me to write a parse and index plugin. After getting these to work I decided I wanted to boost some of the fields I was indexing so I followed the Wiki example (http://wiki.apache.org/nutch/WritingPluginExample) that talks about creating an Indexer extension that adjusts the boosted value of a field. However, following this example I do a crawl and the fields do not show up as boosted in the index. I am using Luke to examine the index produced by Nutch. Instead, all fields have the default boost of 1.0. Is there some configuration that overrides the IndexerPlugin setting the boost on a field level basis? Thanks, Paul Ramirez
