[ https://issues.apache.org/jira/browse/NUTCH-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel updated NUTCH-2237: ----------------------------------- Fix Version/s: (was: 1.16) 1.17 > DeduplicationJob: Add extra order criteria based on slug > -------------------------------------------------------- > > Key: NUTCH-2237 > URL: https://issues.apache.org/jira/browse/NUTCH-2237 > Project: Nutch > Issue Type: Improvement > Reporter: Ron van der Vegt > Priority: Major > Fix For: 1.17 > > Attachments: NUTCH-2237.patch, NUTCH-2237.patch > > > Currently user can elect the main document when signatures are the same on > score, url lenght and fetchtime. The quality of the slug, based mainly on the > amount of meaningful characters, could give users more flexibility to make a > difference between slugified urls and urls based on page id. -- This message was sent by Atlassian Jira (v8.3.4#803005)