I am using Hadoop Streaming.
I figured out a way to do combining using mapper, is it the same as using a
separate combiner?
For example: the input is a list of words, I want to count their total number
for each word.
The traditional mapper is:
while (<STDIN>) {
chomp ($_);
$word = $_;
print ($word\t1\n);
}
........
Instead of using a additional combiner, I modify the mapper to use a hash
%hash = ();
while (<STDIN>) {
chomp ($_);
$word = $_;
$hash{$word} ++;
}
foreach $key (%hash){
print "$key\t$hash{$key}\n";
}
Is it the same as using a seperate combiner?