Anyway why would it slow things down if it converges let's say 100 times
faster (in terms of iterations) and you are able to have a memcached or
whatever shared system (Voldemort) which is equal to the number of MR hosts
i.e. a memcached server on each one of them ?

I understand what you are saying but the theory do not really get into my
head... You mean that the latency in the CPU + Disk-IO is something like
10000 times less (or perhaps more) than the latency between calling a remote
system via sockets ? I can agree on that.

Please point out some code which uses MR so I can examine and test for
myself or use the back your envelope and describe what I need to do make it
happen.
What system are you using to get the inlinks/outlinks from a node ? We map
the matrix up beforehand using lucene and rsync it out on all machines.
Every MR job then uses the same static index.

Cheers
//Marcus




On Sat, Jul 4, 2009 at 1:25 AM, Marcus Herou <[email protected]>wrote:

> When speaking in terms of Hadoop that is I guess...? But normally running
> in a single JVM then this is the case right ?
>
> /M
>
>
> On Sat, Jul 4, 2009 at 1:17 AM, Ted Dunning <[email protected]> wrote:
>
>> No.  It should not want that.
>>
>> On Fri, Jul 3, 2009 at 2:13 PM, Marcus Herou <[email protected]
>> >wrote:
>>
>> > Should not N2 be wanting to be aware of the freshest possible state of
>> N1 ?
>> >
>>
>
>
>
> --
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> [email protected]
> http://www.tailsweep.com/
>
>


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[email protected]
http://www.tailsweep.com/

Reply via email to