Stefan Groschupf wrote:
In case you setup one thread per host, you have maximal as much connections to one host as you have boxes. In may case that are not that much.

Anything more than one is not generally considered polite.

Also it is a reproducible bug that the segment is everytime ~half size of the size you specify or expect based on your crawldb.
See my mail posting.

I cannot reproduce this. I just now ran a crawl with depth=5, topN=100 and mapred.map.tasks=2, starting from a single url. Segments (after the first two) contain over 80 pages with a total of more than 300 pages fetched.

Doug

Reply via email to