Re: data integrity

Prasad Chakka Thu, 29 Jan 2009 12:04:10 -0800

Normally in hive, a table or partition is loaded by a single job/process at 
once. Once it is loaded you can't append or insert any more data into that 
table (only if you do it manually by moving data to that directory) So you can 
most probably easier to enforce the constraints in that loading process. This 
solution not a nifty as RDBMS but the the probability of inserting duplicates 
is much higher in RDBMS though.



________________________________
From: Jeff Hammerbacher <[email protected]>
Reply-To: <[email protected]>
Date: Thu, 29 Jan 2009 11:49:29 -0800
To: <[email protected]>
Cc: Zheng Shao <[email protected]>
Subject: Re: data integrity

Hey Shane,

One possibility would be to run a MapReduce/Hive job after the load that checks 
that your integrity constraints are met.

Regards,
Jeff

On Thu, Jan 29, 2009 at 10:03 AM, Zheng Shao <[email protected]> wrote:
IF is just added lasy evening.

I will add it to wiki today.

We don't have case, decode etc yet.


Zheng



On 1/29/09, Shane Brady <[email protected]> wrote:
> Hello,
>
> I'm rather new to Hive and have been playing with it the last couple weeks
> to see if it is appropriate to use for a particular project inside where I
> work.  My essential question is, how to maintain data integrity inside the
> tables so that we don't accidentally load duplicate data.  Normally we rely
> on indexes or unique keys to enforce this.  Is there a general strategy for
> this in Hive?
>
> In a second question, I haven't seen anything like it in the docs, but is
> there any equivalent to CASE,DECODE, or IF-THEN-ELSE allowed in the query?
>
> Thanks!
>
> -Shane P. Brady
>

--
Sent from Gmail for mobile | mobile.google.com <http://mobile.google.com>

Yours,
Zheng

Re: data integrity

Reply via email to