Thanks, Rod. Were you always able to fill the pipe under the same conditions? I'm puzzling by the difference in fetch speed even when the same number of threads and root urls are used.
I don't have local DNS server yet. To avoid overwhelming ISP's DNS server, I use only 10 threads for the first run of fetch and so the fetch speed is expected not great in this run. But, in the second fetch run, I use 500 threads and it can fill the pipe sometimes, but most of time uses 1/5 of the pipe. The number of hosts, >1500, may be small. How many hosts are usually used in your crawl? AJ On 10/13/05, Rod Taylor <[EMAIL PROTECTED]> wrote: > > On Thu, 2005-10-13 at 13:35 -0700, AJ Chen wrote: > > I try to fetch as fast as it can by using more threads on a large fetch > > list. But, the fetcher starts download at speed much lower than the full > > bandwidth allows. And the start download speed varies a lot from run to > run, > > 200kb/s to 1200kb/s on my DSL line. This variation also happens on T1 > line > > that I just tested. > > Could someone share experience on how to make fetcher use the full > > bandwidth? We know the speed drops gradually during a long fetch run. > But, > > can the fetch achieve the highest speed allowed by the bandwidth when > fetch > > starts? > > I found that for high bandwidth (50Mbits and above) DNS seems to be a > limiting factor. > > 4000 threads with a local caching DNS server seems to be enough to fill > the pipe though > > -- > Rod Taylor <[EMAIL PROTECTED]> > >
