I'm sure there's a doc on this somewhere and if someone can point me
to it I'd be quite grateful:

what I'm looking to do is analyze the output of n prior MR runs and
then see where the same thing showed up in all of them.

For example: do a word count run on sci-fi books, and then at some
point later, a run on romance novels then, at some later point in the
future, go back and find all the statistically significant words that
appeared in both. These are three totally separate MR runs.

I'm sure this is a common, and easy to handle situation, I just don't
have my head around Hadoop enough yet to know what I need to be
searching for to get the right docs.


-- 
- kate = masukomi
http://weblog.masukomi.org/

Reply via email to