For the simplest case, the natural number factorial, the arrangements of job maybe is obvious, but not absolutely. Because a new variable "type" need to be created to store the super large numbers (like http://gmplib.org/), and then a package need to be developed to support map/reduce.

There are other factorials, like the float number factorial, the hyper factorial, the complex number factorial, the exponential factorial, and array number factorial, which makes investigating the distribution of candidate numbers / complexity estimation of subtasks as much complicated as the multiplication itself.

If there is a single reducer to do the final job, then by theory that reducer should be capable to do the whole job. The overall memory complexity is bounded by the last multiplication.

Fortunately, I am not interested in this problem and don't even need to think more about it.

But there are many other factorial,
On 2010-10-14 18:48, Greg Roelofs wrote:
Shi Yu<[email protected]>  wrote:

I don't think so. If the problem was to find the super large prime
number it could be a nice Map/reduce task, but simply calculating the
factorial is a monotonically complexity increasing task. If it is split
into multiple tasks, how to shuffle the complexity ?
I can think of at least a couple of reasonable ways to do so, particularly
if every mapper knows the total number of maps upfront.  The amount of work
done by each mapper would be almost the same.  (Remember, multiplication is
commutative. ;-) )

Some easy tasks will
be finished earlier and just hang there waiting for the complicated
ones.
Not if you split up the work intelligently.

Later, the reducer part still need to take much burden to multiply
all the numbers back, which probably won't gain much efficiency through
the Map/Reduce fashion.
This is arguably a map-only (well, map-mostly) situation.  Consider a first-
stage job wherein each of 1000 maps produces a billion-digit result; perhaps
ten maps would suffice for a second stage, which could then have a single
reducer to handle multiplication of the final set of 100-billion-digit values.

No doubt there are better ways to divide up the work, depending on whether
speed or network bandwidth or consumption of intermediate storage space is
most critical.  I'll leave the calculations to those who care about the
result. :-)

Greg

Reply via email to