Thanks, but for the memory thing, according to 
http://wiki.apache.org/hadoop/HadoopMapReduce , Hadoop combiner is also based 
on memory. Quote: "When the map operation outputs its pairs they are already 
available in memory. For efficiency reasons, sometimes it makes sense to take 
advantage of this fact by supplying a combiner class to perform a reduce-type 
function. ... A combine operation will start gathering the output in in-memory 
lists (instead of on disk), one list per word."
So my code works exactly the same as a Hadoop combiner in terms of memory 
usage. 



----- Original Message ----
From: lohit <[EMAIL PROTECTED]>
To: [email protected]
Sent: Monday, July 21, 2008 3:46:33 PM
Subject: Re: [Streaming] I figured out a way to do combining using mapper, 
would anybody check it?

Yes, for this example, its same. Although you might want to consider one more 
thing. In your code you eat up all you input data into memory and then dump it. 
So, if your input split is very big, your hash would be big as well, and also, 
if reading this data into hash takes more than mapred.task.timeout time, I 
think there is no status reported to job tracker, which assumes that task is 
gone and might kill the task. 

Thanks,
Lohit



----- Original Message ----
From: Gopal Gandhi <[EMAIL PROTECTED]>
To: [email protected]
Cc: [EMAIL PROTECTED]
Sent: Monday, July 21, 2008 2:35:45 PM
Subject: [Streaming] I figured out a way to do combining using mapper, would 
anybody check it?

I am using Hadoop Streaming. 
I figured out a way to do combining using mapper, is it the same as using a 
separate combiner?

For example: the input is a list of words, I want to count their total number 
for each word. 
The traditional mapper is:

while (<STDIN>) {
  chomp ($_);
  $word = $_;
  print ($word\t1\n);
}
.........

Instead of using a additional combiner, I modify the mapper to use a hash

%hash = ();
while (<STDIN>) {
  chomp ($_);
  $word = $_;
  $hash{$word} ++;
}

foreach $key (%hash){
  print "$key\t$hash{$key}\n";
}

Is it the same as using a seperate combiner?



      

Reply via email to