[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585465#comment-14585465 ]
ASF GitHub Bot commented on NUTCH-2039: --------------------------------------- GitHub user sujen1412 opened a pull request: https://github.com/apache/nutch/pull/30 fix for NUTCH-2039 contributed by Sujen Shah You can merge this pull request into a Git repository by running: $ git pull https://github.com/sujen1412/nutch NUTCH-2039 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nutch/pull/30.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #30 ---- commit 18737d63494ebe99ba62115d6b3232cf52e0092f Author: Sujen Shah <sujen1...@gmail.com> Date: 2015-06-05T18:25:39Z Added support for REST services in IndexingJob commit 67678ac67d481f3d6d746bc716d443b132433972 Author: Sujen Shah <sujen1...@gmail.com> Date: 2015-06-05T18:26:05Z Added IndexingJob in JObFactory commit 59d2e1f51ce2a86f21c023c0d00f13c18df076e8 Author: Sujen Shah <sujen1...@gmail.com> Date: 2015-06-09T22:30:27Z Merge remote-tracking branch 'upstream/trunk' into trunk commit 7717816ba2189dbac12ac0217b5bb837c153bebe Author: Sujen Shah <sujen1...@gmail.com> Date: 2015-06-11T16:22:46Z Cosine similarity model scoring plugin commit 38aa53fbdacd5c9bdaf4ea812ed1f5f287ecc0e7 Author: Sujen Shah <sujen1...@gmail.com> Date: 2015-06-11T16:23:31Z Added scoring-similarity plugin in build files commit 2b712c0d07b2d98fed4b3fb91542a78c7973d29b Author: Sujen Shah <sujen1...@gmail.com> Date: 2015-06-14T23:48:03Z Overriding method calculate similarity commit 81ed178312eb1789f06d7a1e739aca4b45542382 Author: Sujen Shah <sujen1...@gmail.com> Date: 2015-06-14T23:48:29Z Added support to remove stop words commit 5bbd0331e412bd07ebf8e01a76e402b6b087106d Author: Sujen Shah <sujen1...@gmail.com> Date: 2015-06-14T23:49:38Z Averaging out similarity scores commit 07b000cfc19058de9dc9e1804911b85f9bf4a296 Author: Sujen Shah <sujen1...@gmail.com> Date: 2015-06-14T23:52:01Z Added Apache license info commit 671c54750f5a78bfb7275fae078310a9c804260c Author: Sujen Shah <sujen1...@gmail.com> Date: 2015-06-15T05:45:05Z Deleted interface files commit d00a64c14bbf3682952020337defacd13950434e Author: Sujen Shah <sujen1...@gmail.com> Date: 2015-06-15T05:45:48Z Correct stopword.txt path commit 5043e584e339fec4a2a04a092fd84a7493f5c953 Author: Sujen Shah <sujen1...@gmail.com> Date: 2015-06-15T05:56:39Z Removed debugging statements ---- > Relevance based scoring filter > ------------------------------ > > Key: NUTCH-2039 > URL: https://issues.apache.org/jira/browse/NUTCH-2039 > Project: Nutch > Issue Type: New Feature > Reporter: Sujen Shah > Labels: memex, nutch > Fix For: 1.11 > > > A ScoringFilter plugin that uses a similarity measure to calculate the > similarity between a given page(gold standard) and the currently parsed page. > The score obtained from this similarity is then distributed to its outlinks. > This filter aims to focus the crawler to crawl/explore relevant pages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)