[
https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585465#comment-14585465
]
ASF GitHub Bot commented on NUTCH-2039:
---------------------------------------
GitHub user sujen1412 opened a pull request:
https://github.com/apache/nutch/pull/30
fix for NUTCH-2039 contributed by Sujen Shah
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sujen1412/nutch NUTCH-2039
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nutch/pull/30.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #30
----
commit 18737d63494ebe99ba62115d6b3232cf52e0092f
Author: Sujen Shah <[email protected]>
Date: 2015-06-05T18:25:39Z
Added support for REST services in IndexingJob
commit 67678ac67d481f3d6d746bc716d443b132433972
Author: Sujen Shah <[email protected]>
Date: 2015-06-05T18:26:05Z
Added IndexingJob in JObFactory
commit 59d2e1f51ce2a86f21c023c0d00f13c18df076e8
Author: Sujen Shah <[email protected]>
Date: 2015-06-09T22:30:27Z
Merge remote-tracking branch 'upstream/trunk' into trunk
commit 7717816ba2189dbac12ac0217b5bb837c153bebe
Author: Sujen Shah <[email protected]>
Date: 2015-06-11T16:22:46Z
Cosine similarity model scoring plugin
commit 38aa53fbdacd5c9bdaf4ea812ed1f5f287ecc0e7
Author: Sujen Shah <[email protected]>
Date: 2015-06-11T16:23:31Z
Added scoring-similarity plugin in build files
commit 2b712c0d07b2d98fed4b3fb91542a78c7973d29b
Author: Sujen Shah <[email protected]>
Date: 2015-06-14T23:48:03Z
Overriding method calculate similarity
commit 81ed178312eb1789f06d7a1e739aca4b45542382
Author: Sujen Shah <[email protected]>
Date: 2015-06-14T23:48:29Z
Added support to remove stop words
commit 5bbd0331e412bd07ebf8e01a76e402b6b087106d
Author: Sujen Shah <[email protected]>
Date: 2015-06-14T23:49:38Z
Averaging out similarity scores
commit 07b000cfc19058de9dc9e1804911b85f9bf4a296
Author: Sujen Shah <[email protected]>
Date: 2015-06-14T23:52:01Z
Added Apache license info
commit 671c54750f5a78bfb7275fae078310a9c804260c
Author: Sujen Shah <[email protected]>
Date: 2015-06-15T05:45:05Z
Deleted interface files
commit d00a64c14bbf3682952020337defacd13950434e
Author: Sujen Shah <[email protected]>
Date: 2015-06-15T05:45:48Z
Correct stopword.txt path
commit 5043e584e339fec4a2a04a092fd84a7493f5c953
Author: Sujen Shah <[email protected]>
Date: 2015-06-15T05:56:39Z
Removed debugging statements
----
> Relevance based scoring filter
> ------------------------------
>
> Key: NUTCH-2039
> URL: https://issues.apache.org/jira/browse/NUTCH-2039
> Project: Nutch
> Issue Type: New Feature
> Reporter: Sujen Shah
> Labels: memex, nutch
> Fix For: 1.11
>
>
> A ScoringFilter plugin that uses a similarity measure to calculate the
> similarity between a given page(gold standard) and the currently parsed page.
> The score obtained from this similarity is then distributed to its outlinks.
> This filter aims to focus the crawler to crawl/explore relevant pages.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)