Edward:

I don't have access to the individual data nodes, so I can't install the pure perl module. I tried distributing it via the add file command, but that is mangling the file name, which causes perl to not load the module as the file name and package name dont match. Kinda frustrating, but it is really all about trying to work around an issue on amazon's elastic map reduce. I love the service in general, but some issues are frustrating.

Sent from my iPhone

On Feb 15, 2010, at 6:05, Edward Capriolo <[email protected]> wrote:

On Mon, Feb 15, 2010 at 1:29 AM, Adam O'Donnell <[email protected]> wrote:
Hope this helps.

Carl

How about this... .can I run a standard hadoop streaming job against a
hive table that is stored as a sequence file?  The idea would be I
would break my hive query into two separate tasks and do a hadoop
streaming job in between, then pick up the hive job afterwards.
Thoughts?

Adam


I actually did do this with a streaming job. The UDF was tied up with
the apache/gpl issues.

Here is how I did this. 1 install geo-ip-perl on all datanodes

 ret = qp.run(
   " FROM ( "+
   " FROM raw_web_data_hour "+
   " SELECT transform( remote_ip ) "+
   " USING 'perl geo_state.pl' "+
   " AS ip, country_code3, region "+
" WHERE log_date_part='"+theDate+"' and log_hour_part='"+theHour +"' " +
   " ) a " +
   " INSERT OVERWRITE TABLE raw_web_data_hour_geo PARTITION
(log_date_part='"+theDate+"',log_hour_part='"+theHour+"') "+
   " SELECT a.country_code3, a.region,a.ip,count(1) as theCount " +
   " GROUP BY a.country_code3,a.region,a.ip "
   );


#!/usr/bin/perl
use Geo::IP;
use strict;
my $gi = Geo::IP->open("/usr/local/share/GeoIP/GeoIPCity.dat", GEOIP_STANDARD);
while (<STDIN>){
 #my $record = $gi->record_by_name("209.191.139.200");
 chomp($_);
 my $record = $gi->record_by_name($_);
 print STDERR "was sent $_ \n" ;
 if (defined $record) {
print $_ . "\t" . $record->country_code3 . "\t" . $record- >region . "\n" ;
   print STDERR "return " . $record->region . "\n" ;
 } else {
   print "??\n";
   print STDERR "return was undefined \n";
 }

}

Good luck.

Reply via email to