Carl: > There's also a Cloudera Blog post from a while back about analyzing GeoIP > data using Pig here: > http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/ > > While less efficient than a UDF, I think you can probably call this > Perl script from a Hive TRANSFORM query without making any changes. > See http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform
Yeah, I want to do something something along those lines, but the hive distribution I am using (amazon's) is mangling file names to the point that I can't fetch additional libraries. This makes grabbing the required perl module a bit challenging. Can you rename a file on the local filesystem after issuing an add file command? Something along the files of: add file s3://bucket/file.pm#file.pm? Adam
