It might not be a reasonable expectation in general, but for this
particular problem I think the amount of memory needed is unreasonable,
since the same recommender without mapreduce is happy with much less memory.
I agree with you, I think the problem is how hadoop splits the task.
Another problem is the expectation that a map-reduce should be happy with
500MB of memory. I don't think that is a universally reasonable
expectation. The real expectation for scalable solutions should be that as
the problem scales, cluster size is allowed to scale, but the size of
individual cluster members does not increase. It is reasonable to expect
that if you scale up the problem, but not the number of nodes in the cluster
that each node may need more and more memory. It is also reasonable to
expect that the base size of each node may be fairly hefty for some problems
(as long as it doesn't increase if problemSize / clusterSize is held
constant).