I was using a nightly build that Pitor had given me the nutch-nightly.jar (actually it was nutch-dev0.7.jar or something of that nature) I tested it on the windows platform, I had 5 machines running it, 2 at 100 mbit both quad p3 xeon, 1 pentium 4 3ghz hyperthreading, 1 amd athlon xp 2600+ and 1 Athlon 64 3500+. all have 1gb or more of ram. now I have my big server and if you have worked on ndfs since the begining of july I'll test it again, my big server's HD array is very fast 200+mbytes a sec, so it will be able to fully saturate gigabit better. anyway the p4 and the 2 amd machines are hooked into the switch at gigabit and the 2 xeons are hooked into my other switch at 100mbit, but it has a gigabit uplink to my gigabit switch, so both xeons would constantly be saturated at 11mbytes a sec. while the p4 was able to reach higher speeds of 50-60mbytes a sec with its internal raid 0 array (dual 120gb drives) my main pc (athlon 64 3500+) was the namenode and a datanode and also the ndfs client, I could not get nutch to work properly with ndfs, it was setup correctly, it "kinda" worked but would crash out the namenode when I was trying to fetch segments in the ndfs filesystem or index them, or do much of anything. so I copied all my segment directories, indexes, content.wtahever it was 1.8gb and some dvd images onto ndfs. my primary machine and nutch runs off 10000rpm disks raid 0 (2x36gb raptors) they can output about 120mbytes a sec sustained so here is what I found out ( in windows) if I dont start a datanode on the namenode with the conf pointing to 127.0.0.1 instead of its outside ip the namenode will not copy data to the other machines, instead if I'm running datanode on the namenode data will replicate from the datanode to the other 3 datanodes, I tried this a hundred ways to try and make it work with an independant namenode without luck. but the way I saw data go across my network was I would put data into ndfs the namenode would request a datanode and find the internal datanode, copy data to it only then after that the datanode would still be coping data from my other hd's into chunks on the raid array, while copying it would replicate to the p4 via gigabit at 50-60mbytes a sec, then it would replicate from the p4 to the xeons kinda like alternating them as I only had replication at default 2 and i had about 100gbytes to copy in so the copy would finish onto the internal raid array fairly quickly then it finished replication to the p4 and the xeons got a little bit of data, but not near as much as the p4, my guess is it only needs 2 copies and the first copy was datanode on the internal machine, the second was the p4 datanode. the xeons only had a smaller connection so they didnt recieve as many chunks as fast as the p4 could, and the p4 had enough space for all the data so it worked out, I should of put replication to 4. the amd athlon xp 1900+ was running linux suse 9.3 and it would crash the namenode on windows if I connected it as a datanode. so that one didnt get tested, but I was able to put out 50-60 mbytes a sec to 1 machine, but it would not replicate data to multiple machines at the same time it seemed. I would of thought it would of output to the xeons at the same time as the p4, give the xeons 20% of the data and the p4 80% or something of that nature, but it could be that they just arent fast enough to request data before the p4 was recieving its 32mb chunks every 1/2 second? The good news cpu usage was only at 50% on my amd 3500+ that was while it was copying data to the internal datanode from the ndfs client from another internal HD running the namenode and running the datanode internally. does it now work with a separate namenode? I'm getting ready to run nutch in linux full time, if I can ever get the damn driver for my highpoint 2220 raid card to work with suse, any suse, the drivers dont work with dual core cpu's or something??? they are working on it, now I'm stuck with fedora 4 untill they fix it. so its not ready for testing yet. I'll let you know when I can test it in a full linux environment. wow that was a long one!!! -Jay