[jira] Commented: (NUTCH-272) Max. pages to crawl/fetch per site (emergency limit)

Matt Kangas (JIRA) Mon, 22 May 2006 15:49:07 -0700

    [ 
http://issues.apache.org/jira/browse/NUTCH-272?page=comments#action_12412842 ]


Matt Kangas commented on NUTCH-272:
-----------------------------------

Agreed that it's looking tough to do in Generate. Alternately, we can try to 
keep the excess URLs from ever entering the crawldb in CrawlDb.update(). (has 
its own issues, noted above...)

> Max. pages to crawl/fetch per site (emergency limit)
> ----------------------------------------------------
>
>          Key: NUTCH-272
>          URL: http://issues.apache.org/jira/browse/NUTCH-272
>      Project: Nutch
>         Type: Improvement

>     Reporter: Stefan Neufeind

>
> If I'm right, there is no way in place right now for setting an "emergency 
> limit" to fetch a certain max. number of pages per site. Is there an "easy" 
> way to implement such a limit, maybe as a plugin?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (NUTCH-272) Max. pages to crawl/fetch per site (emergency limit)

Reply via email to