Hi,
Let's say I want to run a map reduce job on a series of text files (let's
say x.txt y.txt and z.txt)
Given the following mapper function in python (from WordCount.py):
class WordCountMap(Mapper, MapReduceBase):
one = IntWritable(1) # removed
def map(self, key, value, output, reporter):
for w in value.toString().split():
output.collect(Text(w), self.one) #how can I modify this line?
Instead of creating pairs for each word found and the numeral one as the
example is doing, is there a function I can invoke to store the name of the
file it came from instead?
thus, i'd have pairs like <"water", "x.txt"> <"hadoop", y.txt> <"hadoop",
"z.txt"> etc.
I took a look at javadoc, but i'm not sure if I've checked in the right
places. Could someone point me in the right direction?
Thanks!
-SM