[ 
https://issues.apache.org/jira/browse/NUTCH-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel reopened NUTCH-1106:
------------------------------------
      Assignee: Sebastian Nagel  (was: Markus Jelsma)

Reopened, see [discussion 
@user|https://lists.apache.org/thread.html/0f316ce311087f6c366629b7334f4e975114622eff2550ea523fe666@%3Cuser.nutch.apache.org%3E].
 The solution should include:
- filter by length in ParseOutputFormat
- controlled by a new property
- ev. also add an inactive (commented out) rule to regex-urlfilter.txt

> Options to skip url's based on length
> -------------------------------------
>
>                 Key: NUTCH-1106
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1106
>             Project: Nutch
>          Issue Type: Improvement
>          Components: linkdb
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.5, 1.15
>
>         Attachments: NUTCH-1106-1.4-1.patch
>
>
> Adds option to skip URL's exceeding a certain length. At first we used regex 
> to impose this limit but having this options configurable is more convenient. 
> Comments?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to