[ https://issues.apache.org/jira/browse/HIVE-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir Klimontovich updated HIVE-1419: ---------------------------------------- Attachment: corrupted_records_0.5.patch corrupted_records_trunk.patch > Policy on deserialization errors > -------------------------------- > > Key: HIVE-1419 > URL: https://issues.apache.org/jira/browse/HIVE-1419 > Project: Hadoop Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Affects Versions: 0.5.0 > Reporter: Vladimir Klimontovich > Assignee: Vladimir Klimontovich > Priority: Minor > Fix For: 0.5.1, 0.6.0 > > Attachments: corrupted_records_0.5.patch, > corrupted_records_trunk.patch > > > When deserializer throws an exception the whole map tasks fails (see > MapOperator.java file). It's not always an convenient behavior especially on > huge datasets where several corrupted lines could be a normal practice. > Proposed solution: > 1) Have a counter of corrupted records > 2) When a counter exceeds a limit (configurable via > hive.max.deserializer.errors property, 0 by default) throw an exception. > Otherwise just log and exception with WARN level. > Patches for 0.5 branch and trunk are attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.