Policy on deserialization errors -------------------------------- Key: HIVE-1419 URL: https://issues.apache.org/jira/browse/HIVE-1419 Project: Hadoop Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.5.0 Reporter: Vladimir Klimontovich Assignee: Vladimir Klimontovich Priority: Minor Fix For: 0.5.1, 0.6.0
When deserializer throws an exception the whole map tasks fails (see MapOperator.java file). It's not always an convenient behavior especially on huge datasets where several corrupted lines could be a normal practice. Proposed solution: 1) Have a counter of corrupted records 2) When a counter exceeds a limit (configurable via hive.max.deserializer.errors property, 0 by default) throw an exception. Otherwise just log and exception with WARN level. Patches for 0.5 branch and trunk are attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.