> Currently SerDe has to return one row from one line. > You can either do what you mentioned, or write a new InputFileFormat > which filters out the non-data lines. > > Zheng
Is there a possibility of throwing an exception when parsing a specific line, without causing the whole task to fail? I am parsing S3 logs and sometimes there's a misformatted line there. In hadoop's jobdetails.jsp view, there is a counter called org.apache.hadoop.hive.ql.exec.MapOperator$Counter DESERIALIZE_ERRORS, so probably this can be done somehow? If this doesn't work, I'll return a "dummy" record. This would be a bit of a hack, but still better than a big job failing somewhere in the middle. -- Andraz Tori, CTO Zemanta Ltd, New York, London, Ljubljana www.zemanta.com mail: [email protected] tel: +386 41 515 767 twitter: andraz, skype: minmax_test
