[ 
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124577#comment-16124577
 ] 

ASF GitHub Bot commented on NUTCH-1932:
---------------------------------------

sebastian-nagel opened a new pull request #211: NUTCH-1932 Automatically remove 
orphaned pages
URL: https://github.com/apache/nutch/pull/211
 
 
   - apply Markus Jelsma's latest patch, 2016-06-30
   - add method orphanedScore(Text, CrawlDatum) to ScoringFilter interface
   - complete unit tests for CrawlDb update
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Automatically remove orphaned pages
> -----------------------------------
>
>                 Key: NUTCH-1932
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1932
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1932-add.patch, NUTCH-1932.patch, 
> NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, 
> NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, 
> NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, 
> NUTCH-1932.patch
>
>
> Orphan scoring filter that determines whether a page has become orphaned, 
> e.g. it has no more other pages linking to it. If a page hasn't been linked 
> to after markGoneAfter seconds, the page is marked as gone and is then 
> removed by an indexer.  If a page hasn't been linked to after markOrphanAfter 
> seconds, the page is removed from the CrawlDB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to