Policy on deserialization errors
--------------------------------

                 Key: HIVE-1419
                 URL: https://issues.apache.org/jira/browse/HIVE-1419
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Serializers/Deserializers
    Affects Versions: 0.5.0
            Reporter: Vladimir Klimontovich
            Assignee: Vladimir Klimontovich
            Priority: Minor
             Fix For: 0.5.1, 0.6.0


When deserializer throws an exception the whole map tasks fails (see 
MapOperator.java file). It's not always an convenient behavior especially on 
huge datasets where several corrupted lines could be a normal practice. 
Proposed solution:

1) Have a counter of corrupted records
2) When a counter exceeds a limit (configurable via 
hive.max.deserializer.errors property, 0 by default) throw an exception. 
Otherwise just log and exception with WARN level.

Patches for 0.5 branch and trunk are attached



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to