[
https://issues.apache.org/jira/browse/NUTCH-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2237:
-----------------------------------
Fix Version/s: 1.12
> DeduplicationJob: Add extra order criteria based on slug
> --------------------------------------------------------
>
> Key: NUTCH-2237
> URL: https://issues.apache.org/jira/browse/NUTCH-2237
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ron van der Vegt
> Fix For: 1.12
>
> Attachments: NUTCH-2237.patch
>
>
> Currently user can elect the main document when signatures are the same on
> score, url lenght and fetchtime. The quality of the slug, based mainly on the
> amount of meaningful characters, could give users more flexibility to make a
> difference between slugified urls and urls based on page id.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)