So, I'm back to square one. Is there *any* way I can do this using Hive
> alone? I'm fine with running the data through multiple passes, putting it in
> temporary tables, if need be. Should I be looking at UDF or SerDe to achieve
> this?
>


One way, I'm trying out is to have multiple UDFs, each taking the raw log
entry as input and returning a specific field. For example,
extract_ip_address, extract_apache_uid, extract_uri, etc.

Anything simpler?

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Reply via email to