Anyway why would it slow things down if it converges let's say 100 times faster (in terms of iterations) and you are able to have a memcached or whatever shared system (Voldemort) which is equal to the number of MR hosts i.e. a memcached server on each one of them ?
I understand what you are saying but the theory do not really get into my head... You mean that the latency in the CPU + Disk-IO is something like 10000 times less (or perhaps more) than the latency between calling a remote system via sockets ? I can agree on that. Please point out some code which uses MR so I can examine and test for myself or use the back your envelope and describe what I need to do make it happen. What system are you using to get the inlinks/outlinks from a node ? We map the matrix up beforehand using lucene and rsync it out on all machines. Every MR job then uses the same static index. Cheers //Marcus On Sat, Jul 4, 2009 at 1:25 AM, Marcus Herou <[email protected]>wrote: > When speaking in terms of Hadoop that is I guess...? But normally running > in a single JVM then this is the case right ? > > /M > > > On Sat, Jul 4, 2009 at 1:17 AM, Ted Dunning <[email protected]> wrote: > >> No. It should not want that. >> >> On Fri, Jul 3, 2009 at 2:13 PM, Marcus Herou <[email protected] >> >wrote: >> >> > Should not N2 be wanting to be aware of the freshest possible state of >> N1 ? >> > >> > > > > -- > Marcus Herou CTO and co-founder Tailsweep AB > +46702561312 > [email protected] > http://www.tailsweep.com/ > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 [email protected] http://www.tailsweep.com/
