I'm sure there's a doc on this somewhere and if someone can point me to it I'd be quite grateful:
what I'm looking to do is analyze the output of n prior MR runs and then see where the same thing showed up in all of them. For example: do a word count run on sci-fi books, and then at some point later, a run on romance novels then, at some later point in the future, go back and find all the statistically significant words that appeared in both. These are three totally separate MR runs. I'm sure this is a common, and easy to handle situation, I just don't have my head around Hadoop enough yet to know what I need to be searching for to get the right docs. -- - kate = masukomi http://weblog.masukomi.org/
