[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592936#comment-14592936 ]
Chris A. Mattmann commented on NUTCH-2039: ------------------------------------------ Two immediate things: 1. Patch includes old commits (Sujen you need to not include these). Workaround: {noformat} [chipotle:~/tmp/nutch-trunk] mattmann% git apply --exclude "*IndexingJob*" --exclude "*JobFactory*" < 30.patch <stdin>:63: trailing whitespace. boolean deleteGone = false; <stdin>:90: trailing whitespace. } <stdin>:92: trailing whitespace. <stdin>:100: trailing whitespace. <stdin>:104: trailing whitespace. File[] segmentsList = segmentsDir.listFiles(); warning: squelched 52 whitespace errors warning: 57 lines add whitespace errors. [chipotle:~/tmp/nutch-trunk] mattmann% svn add src/plugin/scoring-similarity A src/plugin/scoring-similarity A src/plugin/scoring-similarity/ivy.xml A src/plugin/scoring-similarity/src A src/plugin/scoring-similarity/src/java A src/plugin/scoring-similarity/src/java/org A src/plugin/scoring-similarity/src/java/org/apache A src/plugin/scoring-similarity/src/java/org/apache/nutch A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/Cosine A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/Cosine/DocumentVector.java A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/Cosine/CosineSimilarityModel.java A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/DocumentVector.java A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/SimilarityScoringFilter.java A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/CosineSimilarity.java A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/SimilarityModel.java A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/ScoringFilterModel.java A src/plugin/scoring-similarity/plugin.xml A src/plugin/scoring-similarity/build.xml [chipotle:~/tmp/nutch-trunk] mattmann% svn status M build.xml M default.properties M src/plugin/build.xml A src/plugin/scoring-similarity A src/plugin/scoring-similarity/build.xml A src/plugin/scoring-similarity/ivy.xml A src/plugin/scoring-similarity/plugin.xml A src/plugin/scoring-similarity/src A src/plugin/scoring-similarity/src/java A src/plugin/scoring-similarity/src/java/org A src/plugin/scoring-similarity/src/java/org/apache A src/plugin/scoring-similarity/src/java/org/apache/nutch A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/Cosine A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/Cosine/CosineSimilarityModel.java A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/Cosine/DocumentVector.java A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/CosineSimilarity.java A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/DocumentVector.java A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/ScoringFilterModel.java A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/SimilarityModel.java A src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/SimilarityScoringFilter.java {noformat} 2. the package for .Cosine.* wasn't updated in the code to .cosine.* > Relevance based scoring filter > ------------------------------ > > Key: NUTCH-2039 > URL: https://issues.apache.org/jira/browse/NUTCH-2039 > Project: Nutch > Issue Type: New Feature > Reporter: Sujen Shah > Assignee: Chris A. Mattmann > Labels: memex, nutch > Fix For: 1.11 > > > A ScoringFilter plugin that uses a similarity measure to calculate the > similarity between a given page(gold standard) and the currently parsed page. > The score obtained from this similarity is then distributed to its outlinks. > This filter aims to focus the crawler to crawl/explore relevant pages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)