[ https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239586#comment-13239586 ]
Jonathan Coveney commented on PIG-2614: --------------------------------------- Thanks for taking a look. I'll probably just remove the dependence on the logger and rely on the counters. As far as the bounds, it's should be: pig.piggybank.storage.avro.bad.record.threshold=0.01 pig.piggybank.storage.avro.bad.record.min=100 the threshhold is the error/record ratio you're willing to tolerate. Do you think there is any chance that you could submit a dataset with some known bad rows and some known good rows? That would let me troubleshoot this. I'll fix the logging issue though. > AvroStorage crashes on LOADING a single bad error > ------------------------------------------------- > > Key: PIG-2614 > URL: https://issues.apache.org/jira/browse/PIG-2614 > Project: Pig > Issue Type: Bug > Components: piggybank > Affects Versions: 0.10, 0.11 > Reporter: Russell Jurney > Labels: avro, avrostorage, bad, book, cutting, doug, for, my, > pig, sadism > Fix For: 0.10, 0.11 > > Attachments: PIG-2614_0.patch > > > AvroStorage dies when a single bad record exists, such as one with missing > fields. This is very bad on 'big data,' where bad records are inevitable. > See discussion at > http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss > for more theory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira