Hi Zheng, I have opened a Jira(HIVE295).
IMHO there are three steps errors can be handled: 1) Always fail. One bad record and whole job fails which is the current Hive behavior. 2) Always success. Ignoring bad records(save them somewhere to allow further analysis) and job still successes. 3) Success with condition. Something in the middle ground as you described. What can be done is make this configurable and let the user decide which setting is appropiate for his application. In practice I would image 2) will be most common case(e.g.0.1% error rate). BTW Just curious since you guys already use Hive in prod, how you guarantee the input is 100% given Hive itself doesn't do any checking by itself. One thing I wasn't sure is whether the error handling logic should better belong to the hive layer or the hadoop layer. Hadoop 0.19 already support 2) http://hadoop.apache.org/core/docs/r0.19.0/mapred_tutorial.html#Skipping+Bad+Records and may support 3) in the future. So the blackbox way is for Hive to just expose those API calls or as a general approach allow user add "aspect" to the JobConf object. Is this allowed in Hive design? Regards, Qing On Thu, Feb 19, 2009 at 5:59 PM, Zheng Shao <[email protected]> wrote: > Hi Qing, > > That's a good idea. Can you open a jira? > There are lots of details before we can add that feature to Hive. For > example, how to specify the largest number of data corruption that can > be accepted, by absolute number or percentage, etc. What about half > corrupted records in case we only need the non-corrupted part in the > query, etc. > > > Zheng > > > > On 2/19/09, Qing Yan <[email protected]> wrote: > > Say I have some bad/ill-formatted records in the input, is there a way to > > configure the default Hive parser to discard those records directly(e.g. > > when a integer column get a string)? > > > > Besides, is the new skip-bad-records feature in 0.19 accessible in Hive? > > It is a quite handy feature in the real world. > > > > What I see so far is the Hive parser throws exception and cause the whole > > job to fail ultimately. > > > > Thanks for the help! > > > > Qing > > > > -- > Sent from Gmail for mobile | mobile.google.com > > Yours, > Zheng >
