Ted,

thanks for your reply. This is still in an early phase of research so
I wouldn't like to spend much time on the infrastructure (I need to
sleep also ;) . The simplest possible solution that will work is ok
for now. I'll wait for Ed's implementation.

Your mail actually made me think about my perception of map-reduce
model and hadoop implementation. I was thinking that most of the time
hadoop should protect me from worrying about data access time,
bandwidth, etc. Even if that means the computation will be, lets say,
number of times slower as it would be in the optimal implementation. I
assume you're probably talking about the optimal one, or at least the
good one, and I agree with you.

Of course, hadoop can't hide this completely, I'd still have to follow
some guide lines (use optimal number of mappers/reduces, use
combiners, make splits large enough so that a mapper can work for
couple of minutes, and so on). Hadoop should try to cut down the
bandwidth (by spawning a mapper close to the data, etc).

Ordinary matrix multiplication makes is difficult because each element
from one matrix will by multiplied by all the elements from the other
matrix. Unfortunately, not all problems, like word counting for
example, are splittable in a way that data moving between nodes is
not required.

This is probably single-machine-developer inside me complaining :)
I have to consider better ways to partition my problem(s) eventually...

Again, thanks for your mail. I have few more words for you privately.

-- 
regards,
 Milan

Reply via email to