Ron van der Vegt created NUTCH-2237:
---------------------------------------

             Summary: DeduplicationJob: Add extra order criteria based on slug
                 Key: NUTCH-2237
                 URL: https://issues.apache.org/jira/browse/NUTCH-2237
             Project: Nutch
          Issue Type: Improvement
            Reporter: Ron van der Vegt


Currently user can elect the main document when signatures are the same on 
score, url lenght and fetchtime. The quality of the slug, based mainly on the 
amount of meaningful characters, could give users more flexibility to make a 
difference between slugified urls and urls based on page id.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to