Re: UPDATE statement in Hive?

He Yongqiang Tue, 28 Jul 2009 19:03:08 -0700

The patch contributor of https://issues.apache.org/jira/browse/PIG-6 is a
student here in our institute, but another laboratory.
If hive is interested in this, I will get in touch with him to see if he
would like to do a similar contribution for hive.


On 09-7-29 上午8:10, "Peter Skomoroch" <[email protected]> wrote:

> +1 for Hive queries on HBase - that would be a powerful combination.
> 
> On Tue, Jul 28, 2009 at 8:05 PM, Amr Awadallah <[email protected]> wrote:
>> Saurabh, I think you better off with HBase for this kind of use, see:
>> 
>> http://hadoop.apache.org/hbase/
>> 
>> In a nutshell, HBase is a layer on top of HDFS which supports two things: (1)
>> quick lookups based on keys (e.g. a userid), and (2) transaction semantics at
>> the row-level (update/delete/insert values for a given key).
>> 
>> Ashish, is there any way to run Hive queries on top of HBase? Pig has support
>> for that via this patch:
>> 
>> https://issues.apache.org/jira/browse/PIG-6
>> 
>> -- amr
>> 
>> 
>> Ashish Thusoo wrote:
>>> There is no update statement at this time and as there is no update of a
>>> file in hadoop and update in Hive though possible would just be syntax sugar
>>> for merging the new values to the old data in the table and then rewriting
>>> the table with the merged output. This can be achieved by doing an insert
>>> overwrite on the old table from the results of the merge done by a left
>>> outer join on the old table and the new data staged in another table. Also
>>> note that when you are updating the table, current queries running on the
>>> table may fail.
>>> 
>>> Another option is to change your schema so that the table actually contains
>>> the changes to the row instead of the row values themselves and then change
>>> the query that takes the new schema into account.
>>> 
>>> Ashish
>>> 
>>> ________________________________________
>>> From: Saurabh Nanda [[email protected]]
>>> Sent: Tuesday, July 28, 2009 3:41 AM
>>> To: [email protected]
>>> Subject: UPDATE statement in Hive?
>>> 
>>> Is there an UPDATE statement in Hive? If not, are there any plans for adding
>>> support for it in the future?
>>> 
>>> This is why I ask: I want to maintain a table which, against each user ID,
>>> stores the first visit & last visit time. This is across the entire year,
>>> not a day -- basically to understand how many visitors we got in last 1/3/6
>>> months, etc.
>>> 
>>> I can add new users into a separate partition to get around the limitation
>>> of not being able to append rows to a table. However, I don't know how to
>>> update the last_visited_at column for each user?
>>> 
>>> Is this best achieved by storing this table outside of Hive in a traditional
>>> RDBMS? Using JDBC query Hive for a list of distinct visitors today and based
>>> on that list update the 'external' table.
>>> 
>>> Saurabh.
>>> --
>>> http://nandz.blogspot.com
>>> http://foodieforlife.blogspot.com
>>>  ?
>>> 
>>>

Re: UPDATE statement in Hive?

Reply via email to