On Jan 22, 2007, at 3:55 AM, Andrzej Bialecki wrote:

Why is that? Is it because people don't know about it, or other models for tackling out-of-core tasks are more popular, or it's not applicable to most problems out there? I'm not sure. From my experience I know that it's often not obvious (and sometimes impossible?) how to decompose an existing algorithm so that it fits in the map-reduce paradigm.

I would guess the primary reason is that until Hadoop came along there wasn't a way to write or run map/reduce applications. Map/ reduce is still pretty new. MPI was first released 12 years ago. PVM is even older. A lot of the people in the super computing community are just learning about map/reduce.

Do people use Hadoop for tasks outside the well-known class of web- related problems?

One obvious application that uses map/reduce is log analysis. If you have gigabytes or terabytes of logs to scan and analyze, map/reduce helps a lot.

In terms of non-obvious applications, for one of the Yahoo Hack Days, I wrote a distributed implementation of dancing links (http:// en.wikipedia.org/wiki/Dancing_Links) solver and a pentomino (http:// en.wikipedia.org/wiki/Pentomino) tiling framework based on it. In particular, I solved the 9x10 grid using the 18 one sided pentominos. The approach that I used, expanded the search tree to depth N and wrote out each of possible prefixes as a line. At a depth of 5, my problem generated 150,000 prefixes. I used map/reduce and text input format to split the input space into a bunch of pieces and generate all of the solutions for each prefix, and a single reduce to collect solutions.

So the input file looked like:
1,0,0,1,0,
1,0,0,1,1,
1,0,0,1,2,
1,0,0,1,3,
1,0,0,1,4,
1,0,0,1,5,
..

and the output looked like (each letter is piece and the grid shows the layout):

1,0,0,1,1,
uuuiiiiit
uxupppttt
xxxwppPPt
vxwwffPPn
vwwffYlPn
vvvzfYlnn
yzzzYYlnL
yzZZFYllL
yyZFFFNNL
yZZFNNNLL

1,0,0,1,1,
uuuiiiiit
uxupppttt
xxxwppFNt
vxwwFFFNN
vwwllFnYN
vvvzlnnYN
yzzzlnYYL
yzZZlnfYL
yyZPPPffL
yZZPPffLL

...

I choose that particular problem, because in Knuth's paper on dancing links (http://xxx.lanl.gov/PS_cache/cs/pdf/0011/0011047.pdf) on page 16, he started to solve it and determined it would take a couple of months using his solver. Using Hadoop on 20 nodes, it takes a couple of hours. Fun stuff. I should probably package up the solver and submit it in the Hadoop examples. As a side benefit, I also did a sudoku solver using my dancing links solver. There isn't any point in running the sudoku solver distributed though, since it can solve huge puzzles (as in 42x42) in less than 1 second.

-- Owen

Reply via email to