[ https://issues.apache.org/jira/browse/NUTCH-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102575#comment-13102575 ]
Markus Jelsma commented on NUTCH-1106: -------------------------------------- This patch only works on the crawldb. Perhaps it would be better to use an URL filter so that other jobs behave the same way. > Options to skip url's based on length > ------------------------------------- > > Key: NUTCH-1106 > URL: https://issues.apache.org/jira/browse/NUTCH-1106 > Project: Nutch > Issue Type: Improvement > Components: linkdb > Affects Versions: 1.3 > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Fix For: 1.4, 2.0 > > Attachments: NUTCH-1106-1.4-1.patch > > > Adds option to skip URL's exceeding a certain length. At first we used regex > to impose this limit but having this options configurable is more convenient. > Comments? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira