Hi Danny,

Hmmm makes me wonder that i might be doing something wrong here. I imported just one .bz2 files into HDFS and then launched a map/reduce tasks executing the following command:

/home/hadoop/hadoop/bin/hadoop jar /home/hadoop/hadoop/contrib/streaming/hadoop-streaming.jar -input /user/hadoop/logs/2009/06/22/ -output /user/hadoop/out1 -mapper map.pl -file map.pl -reducer reduce.pl -file reduce.pl -jobconf mapred.reduce.tasks=1*

The .bz file was in the /user/hadoop/logs/2009/06/22 direcctory but the final output part-00000 in /user/hadoop/out1 was meaningless. I was expecting key,value pairs but all i got was a count integer for example: 31,006, no errors were generated at all.

When i ran the same command above with uncompressed file my output was fine giving me the correct key,value pairs. No errors were generated.

Noted below is my map.pl and reduce.pl.

Thanks for your help,
Usman

_*map.pl*_

#!/usr/bin/perl -w
#
#

while (<STDIN>) {

   chomp;
   next if ( ! /^\d+/ );
   my @fields = split(/;/);
   my $cookie = $fields[11];
   print "$cookie\t1\n";

}


_*reduce.pl*_

#!/usr/bin/perl -w
#
#

while (<STDIN>) {

   chomp;
   ($key,$value)  = split(/\t/);
   $count{$key} += $value;
}

foreach $k (keys %count) {
   $c = $count{$k};
   print "$k\t$c\n";
}



Hi Usman,

I'm running 0.18.3 from hadoop.apache.org, and have no issues with bz2
files.   My experiments with these files have been through Pig.  Hope
this is useful to you.

Best regards,

Danny Gross

-----Original Message-----
From: Usman Waheed [mailto:usm...@opera.com] Sent: Wednesday, June 24, 2009 10:09 AM
To: core-user@hadoop.apache.org
Subject: Re: Are .bz2 extensions supported in Hadoop 18.3

The version (18.3) i am running in my cluster is the tar ball i got from

hadoop.apache.org.
So you are suggesting to use the Cloudera 18.3 which supports bzip2
correct?

Thanks,
Usman

I believe the cloudera 18.3 supports bzip2

On Wed, Jun 24, 2009 at 3:45 AM, Usman Waheed <usm...@opera.com>
wrote:
Hi All,

Can I map/reduce logs that have the .bz2 extension in Hadoop 18.3?
I tried but interestingly the output was not what i expected versus
what i
got when my data was in uncompressed format.

Thanks,
Usman



Reply via email to