Re: Fw: Re: near-term plan

Piotr Kosiorowski Fri, 05 Aug 2005 05:06:25 -0700

I am not sure what you exactly did in this test but I understand you
were using jar file prepared by me (it was nutch from trunk + ndfs
patches). As these patches were applied by Andrzej some time ago - we
can assume you were using NDFS code from trunk.
Because a lot of work went into mapreduce branch it woul dbe good to
test it with mapreduce branch code.
Regards
Piotr


On 8/5/05, webmaster <[EMAIL PROTECTED]> wrote:
> 
> ---------- Forwarded Message -----------
> From: "webmaster" <[EMAIL PROTECTED]>
> To: nutch-dev@lucene.apache.org
> Sent: Thu, 4 Aug 2005 19:42:53 -0500
> Subject: Re: near-term plan
> 
> I was using a nightly build that Pitor had given me the nutch-nightly.jar
> (actually it was nutch-dev0.7.jar or something of that nature) I tested it on
> the windows platform, I had 5 machines running it, 2 at 100 mbit both quad p3
> xeon, 1 pentium 4 3ghz hyperthreading, 1 amd athlon xp 2600+ and 1 Athlon 64
> 3500+. all have 1gb or more of ram. now I have my big server and if you have
> worked on ndfs since the begining of july I'll test it again, my big server's
> HD array is very fast 200+mbytes a sec, so it will be able to fully saturate
> gigabit better. anyway the p4 and the 2 amd machines are hooked into the
> switch at gigabit and the 2 xeons are hooked into my other switch at 100mbit,
> but it has a gigabit uplink to my gigabit switch, so both xeons would
> constantly be saturated at 11mbytes a sec. while the p4 was able to reach
> higher speeds of 50-60mbytes a sec with its internal raid 0 array (dual 120gb
> drives) my main pc (athlon 64 3500+) was the namenode and a datanode and also
> the ndfs client, I could not get nutch to work properly with ndfs, it was
> setup correctly, it "kinda" worked but would crash out the namenode when I
> was trying to fetch segments in the ndfs filesystem or index them, or do much
> of anything. so I copied all my segment directories, indexes,
> content.wtahever it was 1.8gb and some dvd images onto ndfs. my primary
> machine and nutch runs off 10000rpm disks raid 0 (2x36gb raptors) they can
> output about 120mbytes a sec sustained so here is what I found out ( in
> windows) if I dont start a datanode on the namenode with the conf pointing to
> 127.0.0.1 instead of its outside ip the namenode will not copy data to the
> other machines, instead if I'm running datanode on the namenode data will
> replicate from the datanode to the other 3 datanodes, I tried this a hundred
> ways to try and make it work with an independant namenode without luck. but
> the way I saw data go across my network was I would put data into ndfs the
> namenode would request a datanode and find the internal datanode, copy data
> to it only then after that the datanode would still be coping data from my
> other hd's into chunks on the raid array, while copying it would replicate to
> the p4 via gigabit at 50-60mbytes a sec, then it would replicate from the p4
> to the xeons kinda like alternating them as I only had replication at default
> 2 and i had about 100gbytes to copy in so the copy would finish onto the
> internal raid array fairly quickly then it finished replication to the p4 and
> the xeons got a little bit of data, but not near as much as the p4, my guess
> is it only needs 2 copies and the first copy was datanode on the internal
> machine, the second was the p4 datanode. the xeons only had a smaller
> connection so they didnt recieve as many chunks as fast as the p4 could, and
> the p4 had enough space for all the data so it worked out, I should of put
> replication to 4. the amd athlon xp 1900+ was running linux suse 9.3 and it
> would crash the namenode on windows if I connected it as a datanode. so that
> one didnt get tested, but I was able to put out 50-60 mbytes a sec to 1
> machine, but it would not replicate data to multiple machines at the same
> time it seemed. I would of thought it would of output to the xeons at the
> same time as the p4, give the xeons 20% of the data and the p4 80% or
> something of that nature, but it could be that they just arent fast enough to
> request data before the p4 was recieving its 32mb chunks every 1/2 second?
> The good news cpu usage was only at 50% on my amd 3500+ that was while it was
> copying data to the internal datanode from the ndfs client from another
> internal HD running the namenode and running the datanode internally. does it
> now work with a separate namenode? I'm getting ready to run nutch in linux
> full time, if I can ever get the damn driver for my highpoint 2220 raid card
> to work with suse, any suse, the drivers dont work with dual core cpu's or
> something??? they are working on it, now I'm stuck with fedora 4 untill they
> fix it. so its not ready for testing yet. I'll let you know when I can test
> it in a full linux environment.
> wow that was a long one!!!
> -Jay
> ------- End of Forwarded Message -------
> 
> 
> --
> Pound Web Hosting www.poundwebhosting.com
> (607)-435-3048
>

Re: Fw: Re: near-term plan

Reply via email to