Hi Eric,

Could you clarify what you mean by 'not delimited properly'?


Here's what a line in the log file looks like. Fields are separated by
spaces; some are quoted, some are not. The timestamp is enclosed in square
braces, and the month is not a number (Jan/Feb/Mar, etc.)

ip_address "-" apache_uid [dd/MMM/yyyy:HH:mm:ss +0530] "GET /location
HTTP/1.1" response_code response_size "referrer" "user_agent_string"
"cookies"



> Also, you should probably set your nonstandard timestamp columns to be of
> string type because hive does not currently natively support timestamps.
> However, if your timestamps are in numerical form such as unix time, you can
> set the columns to be of int type.


I'm fine with a string format for the timestamp, but I need to convert the
month names to numbers and change the format to 'yyyy-mm-dd hh24:mi:ss'
first to make any sort of time based sorting work.

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Reply via email to