I think these options make sense:

1) Always fail. One bad record and whole job fails which is the
current Hive behavior.
2) Always success. Ignoring bad records(save them somewhere to allow
further analysis) and job still successes.

Option 3 (Success with condition) would normally be handled by the client.
>>What can be done is make this configurable and let the user decide which 
>>setting is appropriate for his application
IMHO that is a perfect assessment. Error handling is normally
application specific. I can imagine doing this would result in an
generic API  that likely would not be able to meet the needs of every
user and would/might fall out of the scope of Hive.

>>Oracle has a parameter whereby the number of errors to be ignored can be 
>>configured, same for sql*loader.
>>We can follow the same approach, if the number of bad records exceed a 
>>certain number, kill the job, otherwise continue

This seems useful, it might be harder to implement in a distributed
system then it is in oracle.

Reply via email to