On 4/24/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Doğacan Güney wrote:
> Hi all,
>
> I have been working on Fetcher2 code lately and I came across this
> particular code (in FetchItemQueue.getFetchItem) that I didn't quite
> understand:
>
> public FetchItem getFetchItem() {
> ...
> long last = endTime.get() + (maxThreads > 1 ? crawlDelay : minCrawlDelay);
> ...
> }
>
> Now, the 'default' politeness behaviour should be 1 thread per host
> and delaying n seconds between successive requests to that host,
> right? But, won't this code wait only minCrawlDelay(which, by default,
> is 0) if maxThreads == 1.
Yes, that was the intended behavior - normally, you should never use
more than 1 thread per host, unless you have an explicit permission to
do so.
If multiple threads make requests to the same host, then the crawl delay
parameter loses its usual meaning - see the details of this in comments
to NUTCH-385. However, the sensible way to do is to still provide a way
to limit the maximum rate of requests, and this is what the
minCrawlDelay parameter is for.
I don't get it. The code seems to do exactly the opposite of what you
are saying. If maxThreads == 1 then maxThreads > 1 is false thus the
expression evaluates to minCrawlDelay not crawlDelay. Shouldn't the
expression be (maxThreads > 1 ? minCrawlDelay : crawlDelay) ?
>
> I also did not understand why there is a maxThread check at all. Each
> individual thread should wait crawl delay before making another
> request to the same host. Am I missing something here?
See the ASCII-art graphs and comments in NUTCH-385 - this is likely not
what is expected.
Although this JIRA issue is still open, the Fetcher2 code tries to
implement this middle ground solution.
OK. I guess this approach is good enough.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
--
Doğacan Güney