Re: What do people use Hadoop for?

Owen O'Malley Mon, 22 Jan 2007 13:49:21 -0800


On Jan 22, 2007, at 3:55 AM, Andrzej Bialecki wrote:

Why is that? Is it because people don't know about it, or othermodels for tackling out-of-core tasks are more popular, or it's notapplicable to most problems out there? I'm not sure. From myexperience I know that it's often not obvious (and sometimesimpossible?) how to decompose an existing algorithm so that it fitsin the map-reduce paradigm.

I would guess the primary reason is that until Hadoop came alongthere wasn't a way to write or run map/reduce applications. Map/reduce is still pretty new. MPI was first released 12 years ago. PVMis even older. A lot of the people in the super computing communityare just learning about map/reduce.

Do people use Hadoop for tasks outside the well-known class of web-related problems?

One obvious application that uses map/reduce is log analysis. If youhave gigabytes or terabytes of logs to scan and analyze, map/reducehelps a lot.

In terms of non-obvious applications, for one of the Yahoo Hack Days,I wrote a distributed implementation of dancing links (http://en.wikipedia.org/wiki/Dancing_Links) solver and a pentomino (http://en.wikipedia.org/wiki/Pentomino) tiling framework based on it. Inparticular, I solved the 9x10 grid using the 18 one sided pentominos.The approach that I used, expanded the search tree to depth N andwrote out each of possible prefixes as a line. At a depth of 5, myproblem generated 150,000 prefixes. I used map/reduce and text inputformat to split the input space into a bunch of pieces and generateall of the solutions for each prefix, and a single reduce to collectsolutions.


So the input file looked like:
1,0,0,1,0,
1,0,0,1,1,
1,0,0,1,2,
1,0,0,1,3,
1,0,0,1,4,
1,0,0,1,5,
..

and the output looked like (each letter is piece and the grid showsthe layout):


1,0,0,1,1,
uuuiiiiit
uxupppttt
xxxwppPPt
vxwwffPPn
vwwffYlPn
vvvzfYlnn
yzzzYYlnL
yzZZFYllL
yyZFFFNNL
yZZFNNNLL

1,0,0,1,1,
uuuiiiiit
uxupppttt
xxxwppFNt
vxwwFFFNN
vwwllFnYN
vvvzlnnYN
yzzzlnYYL
yzZZlnfYL
yyZPPPffL
yZZPPffLL

...

I choose that particular problem, because in Knuth's paper on dancinglinks (http://xxx.lanl.gov/PS_cache/cs/pdf/0011/0011047.pdf) on page16, he started to solve it and determined it would take a couple ofmonths using his solver. Using Hadoop on 20 nodes, it takes a coupleof hours. Fun stuff. I should probably package up the solver andsubmit it in the Hadoop examples. As a side benefit, I also did asudoku solver using my dancing links solver. There isn't any point inrunning the sudoku solver distributed though, since it can solve hugepuzzles (as in 42x42) in less than 1 second.


-- Owen

Re: What do people use Hadoop for?

Reply via email to