[ 
https://issues.apache.org/jira/browse/NUTCH-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279797#comment-15279797
 ] 

Markus Jelsma commented on NUTCH-2237:
--------------------------------------

I believe all issues are addresses. Any comments?

> DeduplicationJob: Add extra order criteria based on slug
> --------------------------------------------------------
>
>                 Key: NUTCH-2237
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2237
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Ron van der Vegt
>             Fix For: 1.12
>
>         Attachments: NUTCH-2237.patch, NUTCH-2237.patch
>
>
> Currently user can elect the main document when signatures are the same on 
> score, url lenght and fetchtime. The quality of the slug, based mainly on the 
> amount of meaningful characters, could give users more flexibility to make a 
> difference between slugified urls and urls based on page id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to