I think these options make sense: 1) Always fail. One bad record and whole job fails which is the current Hive behavior. 2) Always success. Ignoring bad records(save them somewhere to allow further analysis) and job still successes.
Option 3 (Success with condition) would normally be handled by the client. >>What can be done is make this configurable and let the user decide which >>setting is appropriate for his application IMHO that is a perfect assessment. Error handling is normally application specific. I can imagine doing this would result in an generic API that likely would not be able to meet the needs of every user and would/might fall out of the scope of Hive. >>Oracle has a parameter whereby the number of errors to be ignored can be >>configured, same for sql*loader. >>We can follow the same approach, if the number of bad records exceed a >>certain number, kill the job, otherwise continue This seems useful, it might be harder to implement in a distributed system then it is in oracle.
