Hi all, I'trying to fetch some million of pages,but I've got some performance problems. I'm using a P4 1700, 768MB ram, and a 10Mb connection. I've changed theese configuration values in nuke-sites.xml:
<property> <name>fetcher.threads.fetch</name> <value>25</value> </property> <property> <name>http.max.delays</name> <value>1</value> </property> <property> <name>fetcher.threads.per.host</name> <value>1</value> </property> <property> <name>io.sort.factor</name> <value>10</value> </property> <property> <name>io.sort.mb</name> <value>1</value> </property> <property> <name>indexer.maxMergeDocs</name> <value>20</value> </property> <property> <name>indexer.termIndexInterval</name> <value>64</value> </property> and I've also added the following line into bin/nutch: JAVA_HEAP_MAX=-Xmx750M It seems a good configuration. So, I give the fetch command, I get theese log messages: 050926 181531 status: segment 20050924151836, 100 pages, 11 errors, 1277608 bytes, 11755 ms 050926 181531 status: 8.507018 pages/s, 849.11206 kb/s, 12776.08 bytes/page 050926 181537 status: segment 20050924151836, 200 pages, 17 errors, 2620277 bytes, 18157 ms 050926 181537 status: 11.015036 pages/s, 1127.4392 kb/s, 13101.385 bytes/page 050926 181548 status: segment 20050924151836, 300 pages, 26 errors, 4243689 bytes, 28657 ms 050926 181548 status: 10.468647 pages/s, 1156.9187 kb/s, 14145.63 bytes/page 050926 181557 status: segment 20050924151836, 400 pages, 32 errors, 5515098 bytes, 38102 ms 050926 181557 status: 10.4981365 pages/s, 1130.8252 kb/s, 13787.745 bytes/page 050926 181607 status: segment 20050924151836, 500 pages, 44 errors, 6678319 bytes, 48464 ms 050926 181607 status: 10.3169365 pages/s, 1076.5592 kb/s, 13356.638 bytes/page but,after some thousand of pages, rates decrease constantly: 050926 180746 status: segment 20050924151836, 6400 pages, 566 errors,85809551 bytes, 853401 ms 050926 180746 status: 7.4994054 pages/s, 785.5476 kb/s, 13407.742 bytes/page 050926 180807 status: segment 20050924151836, 6500 pages, 581 errors,87133135 bytes, 874799 ms 050926 180807 status: 7.4302783 pages/s, 778.1532 kb/s, 13405.098 bytes/page 050926 180823 status: segment 20050924151836, 6600 pages, 589 errors, 88789053 bytes, 890686 ms 050926 180823 status: 7.410019 pages/s, 778.79803 kb/s, 13452.888 bytes/page 050926 180841 status: segment 20050924151836, 6700 pages, 594 errors, 90286731 bytes, 908720 ms 050926 180841 status: 7.3730083 pages/s, 776.21826 kb/s, 13475.631 bytes/page 050926 180901 status: segment 20050924151836, 6800 pages, 601 errors, 91663461 bytes, 928498 ms 050926 180901 status: 7.323656 pages/s, 771.268 kb/s, 13479.921 bytes/page 050926 181014 status: segment 20050924151836, 7200 pages, 627 errors,96922711 bytes, 1001732 ms 050926 181014 status: 7.187551 pages/s, 755.8995 kb/s, 13461.487 bytes/page 050926 181037 status: segment 20050924151836, 7300 pages, 637 errors, 98478215 bytes, 1024844 ms 050926 181037 status: 7.1230354 pages/s, 750.7104 kb/s, 13490.167 bytes/page and I cannot understand how to get a fixed 10pages/s rate (or even a higher one!!). I've read this pages http://wiki.apache.org/nutch/HardwareRequirements and it states that is possible, with 25 fetchers, to download (more or less) at 4Mbit per second, with hardware similar to mine. So, how can I set up nutch to fetch at a higher rate?? Thank you so much!!!!! Menoz -- Free Software Enthusiast Debian Powered Linux User #332564 http://menoz.homelinux.org