Policy on deserialization errors
--------------------------------
Key: HIVE-1419
URL: https://issues.apache.org/jira/browse/HIVE-1419
Project: Hadoop Hive
Issue Type: Improvement
Components: Serializers/Deserializers
Affects Versions: 0.5.0
Reporter: Vladimir Klimontovich
Assignee: Vladimir Klimontovich
Priority: Minor
Fix For: 0.5.1, 0.6.0
When deserializer throws an exception the whole map tasks fails (see
MapOperator.java file). It's not always an convenient behavior especially on
huge datasets where several corrupted lines could be a normal practice.
Proposed solution:
1) Have a counter of corrupted records
2) When a counter exceeds a limit (configurable via
hive.max.deserializer.errors property, 0 by default) throw an exception.
Otherwise just log and exception with WARN level.
Patches for 0.5 branch and trunk are attached
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.