[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218818#comment-16218818
]
ASF GitHub Bot commented on NUTCH-1932:
---------------------------------------
sebastian-nagel commented on issue #211: NUTCH-1932 Automatically remove
orphaned pages
URL: https://github.com/apache/nutch/pull/211#issuecomment-339355997
Merged after rebase and squash.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Automatically remove orphaned pages
> -----------------------------------
>
> Key: NUTCH-1932
> URL: https://issues.apache.org/jira/browse/NUTCH-1932
> Project: Nutch
> Issue Type: New Feature
> Affects Versions: 1.13
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.14
>
> Attachments: NUTCH-1932-add.patch, NUTCH-1932.patch,
> NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch,
> NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch,
> NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch, NUTCH-1932.patch,
> NUTCH-1932.patch, NUTCH-1932.patch
>
>
> Orphan scoring filter that determines whether a page has become orphaned,
> e.g. it has no more other pages linking to it. If a page hasn't been linked
> to after markGoneAfter seconds, the page is marked as gone and is then
> removed by an indexer. If a page hasn't been linked to after markOrphanAfter
> seconds, the page is removed from the CrawlDB.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)