[ 
https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585465#comment-14585465
 ] 

ASF GitHub Bot commented on NUTCH-2039:
---------------------------------------

GitHub user sujen1412 opened a pull request:

    https://github.com/apache/nutch/pull/30

    fix for NUTCH-2039 contributed by Sujen Shah

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sujen1412/nutch NUTCH-2039

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nutch/pull/30.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #30
    
----
commit 18737d63494ebe99ba62115d6b3232cf52e0092f
Author: Sujen Shah <sujen1...@gmail.com>
Date:   2015-06-05T18:25:39Z

    Added support for REST services in IndexingJob

commit 67678ac67d481f3d6d746bc716d443b132433972
Author: Sujen Shah <sujen1...@gmail.com>
Date:   2015-06-05T18:26:05Z

    Added IndexingJob in JObFactory

commit 59d2e1f51ce2a86f21c023c0d00f13c18df076e8
Author: Sujen Shah <sujen1...@gmail.com>
Date:   2015-06-09T22:30:27Z

    Merge remote-tracking branch 'upstream/trunk' into trunk

commit 7717816ba2189dbac12ac0217b5bb837c153bebe
Author: Sujen Shah <sujen1...@gmail.com>
Date:   2015-06-11T16:22:46Z

    Cosine similarity model scoring plugin

commit 38aa53fbdacd5c9bdaf4ea812ed1f5f287ecc0e7
Author: Sujen Shah <sujen1...@gmail.com>
Date:   2015-06-11T16:23:31Z

    Added scoring-similarity plugin in build files

commit 2b712c0d07b2d98fed4b3fb91542a78c7973d29b
Author: Sujen Shah <sujen1...@gmail.com>
Date:   2015-06-14T23:48:03Z

    Overriding method calculate similarity

commit 81ed178312eb1789f06d7a1e739aca4b45542382
Author: Sujen Shah <sujen1...@gmail.com>
Date:   2015-06-14T23:48:29Z

    Added support to remove stop words

commit 5bbd0331e412bd07ebf8e01a76e402b6b087106d
Author: Sujen Shah <sujen1...@gmail.com>
Date:   2015-06-14T23:49:38Z

    Averaging out similarity scores

commit 07b000cfc19058de9dc9e1804911b85f9bf4a296
Author: Sujen Shah <sujen1...@gmail.com>
Date:   2015-06-14T23:52:01Z

    Added Apache license info

commit 671c54750f5a78bfb7275fae078310a9c804260c
Author: Sujen Shah <sujen1...@gmail.com>
Date:   2015-06-15T05:45:05Z

    Deleted interface files

commit d00a64c14bbf3682952020337defacd13950434e
Author: Sujen Shah <sujen1...@gmail.com>
Date:   2015-06-15T05:45:48Z

    Correct stopword.txt path

commit 5043e584e339fec4a2a04a092fd84a7493f5c953
Author: Sujen Shah <sujen1...@gmail.com>
Date:   2015-06-15T05:56:39Z

    Removed debugging statements

----


> Relevance based scoring filter
> ------------------------------
>
>                 Key: NUTCH-2039
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2039
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Sujen Shah
>              Labels: memex, nutch
>             Fix For: 1.11
>
>
> A ScoringFilter plugin that uses a similarity measure to calculate the 
> similarity between a given page(gold standard) and the currently parsed page. 
> The score obtained from this similarity is then distributed to its outlinks. 
> This filter aims to focus the crawler to crawl/explore relevant pages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to