Doug,
I don't recommend this change. It makes your crawler impolite, since multiple tasks may reference each host. Perhaps you simply need to increase http.max.delays? What is this set to?

In case you setup one thread per host, you have maximal as much connections to one host as you have boxes. In may case that are not that much. Also it is a reproducible bug that the segment is everytime ~half size of the size you specify or expect based on your crawldb.
See my mail posting.
I hadn't time to dig into the problem and find the bug exactly, the partioner itself works, but somehow a combination of things fails. Anyway it is on my list and as soon I discover the real problem it is a fair workaround to use the hash partitioner for some days.

Stefan

Reply via email to