Re: UPDATE statement in Hive?

He Yongqiang Tue, 28 Jul 2009 20:51:25 -0700

Talked with Samuel Guo, and I am sure he will work on it soon.

On 09-7-29 上午10:15, "Ashish Thusoo" <[email protected]> wrote:


> That would be great Youngqiang.
>  
> Amr, we don't have that kind of support but would love to add it.
>  
> Ashish
> 
> 
> From: He Yongqiang [mailto:[email protected]]
> Sent: Tuesday, July 28, 2009 7:03 PM
> To: [email protected]
> Subject: Re: UPDATE statement in Hive?
> 
> The patch contributor of https://issues.apache.org/jira/browse/PIG-6 is a
> student here in our institute, but another laboratory.
> If hive is interested in this, I will get in touch with him to see if he would
> like to do a similar contribution for hive.
> 
> On 09-7-29 上午8:10, "Peter Skomoroch" <[email protected]> wrote:
> 
>> +1 for Hive queries on HBase - that would be a  powerful combination.
>> 
>> On Tue, Jul 28, 2009 at 8:05 PM, Amr Awadallah  <[email protected]> wrote:
>>  
>>> Saurabh, I think you better off with HBase for this  kind of use, see:
>>> 
>>> http://hadoop.apache.org/hbase/
>>> 
>>> In  a nutshell, HBase is a layer on top of HDFS which supports two things:
>>> (1)  quick lookups based on keys (e.g. a userid), and (2) transaction
>>> semantics  at the row-level (update/delete/insert values for a given  key).
>>> 
>>> Ashish, is there any way to run Hive queries on top of HBase?  Pig has
>>> support for that via this  patch:
>>> 
>>> https://issues.apache.org/jira/browse/PIG-6
>>> 
>>> -- amr
>>> 
>>> 
>>> Ashish Thusoo  wrote:
>>>  
>>>> There is no update statement at this time and as  there is no update of a
>>>> file in hadoop and update in Hive though possible  would just be syntax
>>>> sugar for merging the new values to the old data in  the table and then
>>>> rewriting the table with the merged output. This can be  achieved by doing
>>>> an insert overwrite on the old table from the results of  the merge done by
>>>> a left outer join on the old table and the new data  staged in another
>>>> table. Also note that when you are updating the table,  current queries
>>>> running on the table may fail.
>>>> 
>>>> Another option is to  change your schema so that the table actually
>>>> contains the changes to the  row instead of the row values themselves and
>>>> then change the query that  takes the new schema into  account.
>>>> 
>>>> Ashish
>>>> 
>>>> ________________________________________
>>>> From:  Saurabh Nanda [[email protected]]
>>>> Sent: Tuesday, July 28, 2009  3:41 AM
>>>> To: [email protected]
>>>> Subject: UPDATE statement in  Hive?
>>>> 
>>>> Is there an UPDATE statement in Hive? If not, are there any  plans for
>>>> adding support for it in the future?
>>>> 
>>>> This is why I ask: I  want to maintain a table which, against each user ID,
>>>> stores the first  visit & last visit time. This is across the entire year,
>>>> not a day --  basically to understand how many visitors we got in last
>>>> 1/3/6 months,  etc.
>>>> 
>>>> I can add new users into a separate partition to get around  the limitation
>>>> of not being able to append rows to a table. However, I  don't know how to
>>>> update the last_visited_at column for each  user?
>>>> 
>>>> Is this best achieved by storing this table outside of Hive  in a
>>>> traditional RDBMS? Using JDBC query Hive for a list of distinct  visitors
>>>> today and based on that list update the 'external'  table.
>>>> 
>>>> Saurabh.
>>>> --
>>>> http://nandz.blogspot.com
>>>> http://foodieforlife.blogspot.com
>>>>  ?
>>>> 
>>>> 
>>>>

Re: UPDATE statement in Hive?

Reply via email to