Update on fetch performance of my current run: download speed has been stable at 3.8 pages/sec, about 640kbps. This is probably limited by my bandwidth - regular DSL service, promising up to 1.5 mbps inbound but realistically only 640 kbps.
More than 1 million pages were fetched, but it took several days at current speed - just too slow. I'm planning to get more bandwidth. Could someone share their experience on what stable rate (pages/sec) can be achieved using 3 mbps or 10 mbps inbound connection? Thanks, AJ On 9/28/05, AJ Chen <[EMAIL PROTECTED]> wrote: > > I started the crawler with about 2000 sites. The fetcher could achieve > 7 pages/sec initially, but the performance gradually dropped to about 2 > pages/sec, sometimes even 0.5 pages/sec. The fetch list had 300k pages > and I used 500 threads. What are the main causes of this slowing down? > Below are sample status: > > 050927 005952 status: segment 20050927005922, 100 pages, 3 errors, > 1784615 bytes, 14611 ms > 050927 005952 status: 6.8441586 pages/s, 954.2334 kb/s, 17846.15bytes/page > 050927 010005 status: segment 20050927005922, 200 pages, 9 errors, > 3656863 bytes, 28170 ms > 050927 010005 status: 7.0997515 pages/s, 1014.1726 kb/s, 18284.314 > bytes/page > > after sometime ... > 050927 171818 status: segment 20050927070752, 101400 pages, 7201 errors, > 2593026554 bytes, 36216316 ms > 050927 171818 status: 2.799843 pages/s, 559.3617 kb/s, 25572.254bytes/page > 050927 171832 status: segment 20050927070752, 101500 pages, 7204 errors, > 2595591632 bytes, 36230516 ms > 050927 171832 status: 2.8015058 pages/s, 559.6956 kb/s, 25572.332bytes/page > > Thanks, > AJ > >
