Yes Ken. Please try http://www.fileformat.info/tool/regex.htm to test your regex to see if it can match your data or not.
Zheng On Mon, Nov 30, 2009 at 6:11 PM, <[email protected]> wrote: > I looked in hive.log while doing the CREATE TABLE and found: > > 2009-11-30 17:51:53,240 WARN serde2.RegexSerDe > (RegexSerDe.java:deserialize(179)) - 1 unmatched rows are found: Nov 2 > 23:00:04 sf45-cc2 sendmail[10905]: [ID 801593 mail.info] nA3704qO010905: > from=nagios, size=286, class=0, nrcpts=1, > msgid=<[email protected]>, > relay=nag...@localhost > > So instead of grabbing a log file at random, I went back and used exactly > the same one I did when I was testing the Serde. > This time, the SELECT statement worked fine. > > Maybe I’ll run my regex over a large sampling of our syslog files and see > how many don’t conform to it. I may have to change my regex to be more > forgiving of deviations… > _____________________________________________ > From: Barclay, Ken > Sent: Monday, November 30, 2009 5:07 PM > To: [email protected] > Subject: Table created using RegexSerDe doesn't work when made external > > > Hello, > > I’m getting only NULLs when trying to read data from an external table for > which the data was staged. I’m using Cloudera’s hive-0.4.0+14.tar.gz with > hadoop-0.20.1+152.tar.gz on a Centos machine. > > My steps were: > > # CREATE TABLE > > CREATE EXTERNAL TABLE mailog_stg(month STRING, day STRING, time STRING, host > STRING, logline STRING) > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' > WITH SERDEPROPERTIES ( > "input.regex" = "^([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) (.+)$", > "output.format.string" = "%1$s %2$s %3$s %4$s %5$s" > ) > STORED AS TEXTFILE > LOCATION '/user/data/staging/mailog'; > > # COPY DATA FILE > > hadoop dfs -put /ken/sf45-cc2-maillogs/mail.20091102 > /user/data/staging/mailog > > # DO A SELECT > > hive> select * from mailog_stg; > OK > NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL > > I verified my data is out there: > > [r...@jims-desktop ~]# hadoop dfs -cat /user/data/staging/mailog/m*|wc > 4 77 1016 > > And in fact when I used LOAD DATA INPATH with a table defined the same way > but without making it external, it worked fine. > > Any suggestions how to debug this? > > Thanks > Ken > > -- Yours, Zheng
