> Currently SerDe has to return one row from one line.
> You can either do what you mentioned, or write a new InputFileFormat
> which filters out the non-data lines.
> 
> Zheng

Is there a possibility of throwing an exception when parsing a specific
line, without causing the whole task to fail?

I am parsing S3 logs and sometimes there's a misformatted line there.

In hadoop's jobdetails.jsp view, there is a counter called
org.apache.hadoop.hive.ql.exec.MapOperator$Counter DESERIALIZE_ERRORS,
so probably this can be done somehow?

If this doesn't work, I'll return a "dummy" record. This would be a bit
of a hack, but still better than a big job failing somewhere in the
middle.


-- 
Andraz Tori, CTO
Zemanta Ltd, New York, London, Ljubljana
www.zemanta.com
mail: [email protected]
tel: +386 41 515 767
twitter: andraz, skype: minmax_test



Reply via email to