We are importing hadoop logs inside hive, but are running in some issues.
Sample log lines:
2010-02-25 14:27:18,000 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:
Query: SELECT * FROM logs_temp;
runs fine for the above statement.
However, for the log lines:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at
cm-hadoop01.mozilla.org/10.2.72.53
************************************************************/
Query: SELECT * FROM logs_temp;
Failed with exception java.io.IOException:java.lang.NullPointerException
However, SELECT count(1) FROM logs_temp;
returns 3 rows, which is correct.
Table structure given below:
add jar /usr/lib/hive/lib/hive_contrib.jar;
CREATE EXTERNAL TABLE logs_temp(
line_date STRING,
line_time STRING,
message_type STRING,
classname STRING,
message STRING
)
PARTITIONED BY (ds STRING, ts STRING, hn STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" =
"^(\\d{4}(?>-\\d{2}){2})\\s((?>\\d{2}[:,]){3}\\d{3})\\s([A-Z]+)\\s([^:]+):\\s(.*)"
)
STORED AS TEXTFILE;
Any idea on what might be going wrong here?
-Anurag