The parser has already filtered out the unwanted urls as per the old regex
rules. So "update" will not get those urls.
Run "bin/nutch parse -all -force" to reparse the segments with the new
regexes and then try what you did earlier ie. update -> generate -> fetch
etc..


On Mon, May 27, 2013 at 2:40 AM, Nicholas W <4...@log1.net> wrote:

> I had previously excluded some urls in a nutch crawl to limit the scope of
> the crawl during testing by including the appropriate regex in the
> regex-urlfilter.txt file . I would now like to lift those restrictions and
> have editing the regex-urlfilter.txt to allow more urls. However after
> executing
>
> bin/nutch updatedb
> bin/nutch generate
>
> When I execute
>
> bin/nutch fetch -all
>
> I still dont get any of the new urls.
>
> What am I missing?
>
> Thanks a lot for your suggestions
>
>
> Regards,
>
> Nicholas W.
>

Reply via email to