Normally in hive, a table or partition is loaded by a single job/process at once. Once it is loaded you can't append or insert any more data into that table (only if you do it manually by moving data to that directory) So you can most probably easier to enforce the constraints in that loading process. This solution not a nifty as RDBMS but the the probability of inserting duplicates is much higher in RDBMS though.
________________________________ From: Jeff Hammerbacher <[email protected]> Reply-To: <[email protected]> Date: Thu, 29 Jan 2009 11:49:29 -0800 To: <[email protected]> Cc: Zheng Shao <[email protected]> Subject: Re: data integrity Hey Shane, One possibility would be to run a MapReduce/Hive job after the load that checks that your integrity constraints are met. Regards, Jeff On Thu, Jan 29, 2009 at 10:03 AM, Zheng Shao <[email protected]> wrote: IF is just added lasy evening. I will add it to wiki today. We don't have case, decode etc yet. Zheng On 1/29/09, Shane Brady <[email protected]> wrote: > Hello, > > I'm rather new to Hive and have been playing with it the last couple weeks > to see if it is appropriate to use for a particular project inside where I > work. My essential question is, how to maintain data integrity inside the > tables so that we don't accidentally load duplicate data. Normally we rely > on indexes or unique keys to enforce this. Is there a general strategy for > this in Hive? > > In a second question, I haven't seen anything like it in the docs, but is > there any equivalent to CASE,DECODE, or IF-THEN-ELSE allowed in the query? > > Thanks! > > -Shane P. Brady > -- Sent from Gmail for mobile | mobile.google.com <http://mobile.google.com> Yours, Zheng
