Hi Eric, Could you clarify what you mean by 'not delimited properly'?
Here's what a line in the log file looks like. Fields are separated by spaces; some are quoted, some are not. The timestamp is enclosed in square braces, and the month is not a number (Jan/Feb/Mar, etc.) ip_address "-" apache_uid [dd/MMM/yyyy:HH:mm:ss +0530] "GET /location HTTP/1.1" response_code response_size "referrer" "user_agent_string" "cookies" > Also, you should probably set your nonstandard timestamp columns to be of > string type because hive does not currently natively support timestamps. > However, if your timestamps are in numerical form such as unix time, you can > set the columns to be of int type. I'm fine with a string format for the timestamp, but I need to convert the month names to numbers and change the format to 'yyyy-mm-dd hh24:mi:ss' first to make any sort of time based sorting work. Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
