[ https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212386#comment-16212386 ]
Sebastian Nagel commented on NUTCH-1932: ---------------------------------------- Ok, also the PR is rebased to the current master. Also required some manual work. :) > Automatically remove orphaned pages > ----------------------------------- > > Key: NUTCH-1932 > URL: https://issues.apache.org/jira/browse/NUTCH-1932 > Project: Nutch > Issue Type: New Feature > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Priority: Minor > Attachments: NUTCH-1932-add.patch, NUTCH-1932.patch, > NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, > NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, > NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, > NUTCH-1932.patch, NUTCH-1932.patch > > > Orphan scoring filter that determines whether a page has become orphaned, > e.g. it has no more other pages linking to it. If a page hasn't been linked > to after markGoneAfter seconds, the page is marked as gone and is then > removed by an indexer. If a page hasn't been linked to after markOrphanAfter > seconds, the page is removed from the CrawlDB. -- This message was sent by Atlassian JIRA (v6.4.14#64029)