Re: UPDATE statement in Hive?

Peter Skomoroch Tue, 28 Jul 2009 17:11:00 -0700

+1 for Hive queries on HBase - that would be a powerful combination.

On Tue, Jul 28, 2009 at 8:05 PM, Amr Awadallah <[email protected]> wrote:


> Saurabh, I think you better off with HBase for this kind of use, see:
>
> http://hadoop.apache.org/hbase/
>
> In a nutshell, HBase is a layer on top of HDFS which supports two things:
> (1) quick lookups based on keys (e.g. a userid), and (2) transaction
> semantics at the row-level (update/delete/insert values for a given key).
>
> Ashish, is there any way to run Hive queries on top of HBase? Pig has
> support for that via this patch:
>
> https://issues.apache.org/jira/browse/PIG-6
>
> -- amr
>
>
> Ashish Thusoo wrote:
>
>> There is no update statement at this time and as there is no update of a
>> file in hadoop and update in Hive though possible would just be syntax sugar
>> for merging the new values to the old data in the table and then rewriting
>> the table with the merged output. This can be achieved by doing an insert
>> overwrite on the old table from the results of the merge done by a left
>> outer join on the old table and the new data staged in another table. Also
>> note that when you are updating the table, current queries running on the
>> table may fail.
>>
>> Another option is to change your schema so that the table actually
>> contains the changes to the row instead of the row values themselves and
>> then change the query that takes the new schema into account.
>>
>> Ashish
>>
>> ________________________________________
>> From: Saurabh Nanda [[email protected]]
>> Sent: Tuesday, July 28, 2009 3:41 AM
>> To: [email protected]
>> Subject: UPDATE statement in Hive?
>>
>> Is there an UPDATE statement in Hive? If not, are there any plans for
>> adding support for it in the future?
>>
>> This is why I ask: I want to maintain a table which, against each user ID,
>> stores the first visit & last visit time. This is across the entire year,
>> not a day -- basically to understand how many visitors we got in last 1/3/6
>> months, etc.
>>
>> I can add new users into a separate partition to get around the limitation
>> of not being able to append rows to a table. However, I don't know how to
>> update the last_visited_at column for each user?
>>
>> Is this best achieved by storing this table outside of Hive in a
>> traditional RDBMS? Using JDBC query Hive for a list of distinct visitors
>> today and based on that list update the 'external' table.
>>
>> Saurabh.
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>>
>>
>


-- 
Peter N. Skomoroch
617.285.8348
http://www.datawrangling.com
http://delicious.com/pskomoroch
http://twitter.com/peteskomoroch

Re: UPDATE statement in Hive?

Reply via email to