Author: ab Date: Fri Sep 22 14:49:09 2006 New Revision: 449102 URL: http://svn.apache.org/viewvc?view=rev&rev=449102 Log: NUTCH-332: fix the problem of doubling scores caused by links pointing to the current page (e.g. anchors).
Modified: lucene/nutch/trunk/CHANGES.txt lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java Modified: lucene/nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/CHANGES.txt?view=diff&rev=449102&r1=449101&r2=449102 ============================================================================== --- lucene/nutch/trunk/CHANGES.txt (original) +++ lucene/nutch/trunk/CHANGES.txt Fri Sep 22 14:49:09 2006 @@ -29,6 +29,9 @@ 10. NUTCH-367 - DistributedSearch thown ClassCastException (siren) +11. NUTCH-332 - Fix the problem of doubling scores caused by links pointing + to the current page (e.g. anchors). (Stefan Groschupf via ab) + Release 0.8 - 2006-07-25 Modified: lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java?view=diff&rev=449102&r1=449101&r2=449102 ============================================================================== --- lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java (original) +++ lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java Fri Sep 22 14:49:09 2006 @@ -121,6 +121,8 @@ } catch (Exception e) { toUrl = null; } + // ignore links to self (or anchors within the page) + if (fromUrl.equals(toUrl)) toUrl = null; if (toUrl != null) validCount++; toUrls[i] = toUrl; } ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs