RE: UPDATE statement in Hive?

Ashish Thusoo Tue, 28 Jul 2009 19:17:55 -0700

That would be great Youngqiang.

Amr, we don't have that kind of support but would love to add it.

Ashish

________________________________
From: He Yongqiang [mailto:[email protected]]
Sent: Tuesday, July 28, 2009 7:03 PM
To: [email protected]
Subject: Re: UPDATE statement in Hive?

The patch contributor of https://issues.apache.org/jira/browse/PIG-6 is a 
student here in our institute, but another laboratory.
If hive is interested in this, I will get in touch with him to see if he would 
like to do a similar contribution for hive.

On 09-7-29 上午8:10, "Peter Skomoroch" <[email protected]> wrote:

+1 for Hive queries on HBase - that would be a powerful combination.

On Tue, Jul 28, 2009 at 8:05 PM, Amr Awadallah <[email protected]> wrote:
Saurabh, I think you better off with HBase for this kind of use, see:

http://hadoop.apache.org/hbase/

In a nutshell, HBase is a layer on top of HDFS which supports two things: (1) 
quick lookups based on keys (e.g. a userid), and (2) transaction semantics at 
the row-level (update/delete/insert values for a given key).

Ashish, is there any way to run Hive queries on top of HBase? Pig has support 
for that via this patch:

https://issues.apache.org/jira/browse/PIG-6

-- amr

Ashish Thusoo wrote:
There is no update statement at this time and as there is no update of a file 
in hadoop and update in Hive though possible would just be syntax sugar for 
merging the new values to the old data in the table and then rewriting the 
table with the merged output. This can be achieved by doing an insert overwrite 
on the old table from the results of the merge done by a left outer join on the 
old table and the new data staged in another table. Also note that when you are 
updating the table, current queries running on the table may fail.

Another option is to change your schema so that the table actually contains the 
changes to the row instead of the row values themselves and then change the 
query that takes the new schema into account.

Ashish

________________________________________
From: Saurabh Nanda [[email protected]]
Sent: Tuesday, July 28, 2009 3:41 AM
To: [email protected]
Subject: UPDATE statement in Hive?

Is there an UPDATE statement in Hive? If not, are there any plans for adding 
support for it in the future?

This is why I ask: I want to maintain a table which, against each user ID, 
stores the first visit & last visit time. This is across the entire year, not a 
day -- basically to understand how many visitors we got in last 1/3/6 months, 
etc.

I can add new users into a separate partition to get around the limitation of 
not being able to append rows to a table. However, I don't know how to update 
the last_visited_at column for each user?

Is this best achieved by storing this table outside of Hive in a traditional 
RDBMS? Using JDBC query Hive for a list of distinct visitors today and based on 
that list update the 'external' table.

Saurabh.
--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com
 ?

RE: UPDATE statement in Hive?

Reply via email to