+1 for Hive queries on HBase - that would be a powerful combination. On Tue, Jul 28, 2009 at 8:05 PM, Amr Awadallah <[email protected]> wrote:
> Saurabh, I think you better off with HBase for this kind of use, see: > > http://hadoop.apache.org/hbase/ > > In a nutshell, HBase is a layer on top of HDFS which supports two things: > (1) quick lookups based on keys (e.g. a userid), and (2) transaction > semantics at the row-level (update/delete/insert values for a given key). > > Ashish, is there any way to run Hive queries on top of HBase? Pig has > support for that via this patch: > > https://issues.apache.org/jira/browse/PIG-6 > > -- amr > > > Ashish Thusoo wrote: > >> There is no update statement at this time and as there is no update of a >> file in hadoop and update in Hive though possible would just be syntax sugar >> for merging the new values to the old data in the table and then rewriting >> the table with the merged output. This can be achieved by doing an insert >> overwrite on the old table from the results of the merge done by a left >> outer join on the old table and the new data staged in another table. Also >> note that when you are updating the table, current queries running on the >> table may fail. >> >> Another option is to change your schema so that the table actually >> contains the changes to the row instead of the row values themselves and >> then change the query that takes the new schema into account. >> >> Ashish >> >> ________________________________________ >> From: Saurabh Nanda [[email protected]] >> Sent: Tuesday, July 28, 2009 3:41 AM >> To: [email protected] >> Subject: UPDATE statement in Hive? >> >> Is there an UPDATE statement in Hive? If not, are there any plans for >> adding support for it in the future? >> >> This is why I ask: I want to maintain a table which, against each user ID, >> stores the first visit & last visit time. This is across the entire year, >> not a day -- basically to understand how many visitors we got in last 1/3/6 >> months, etc. >> >> I can add new users into a separate partition to get around the limitation >> of not being able to append rows to a table. However, I don't know how to >> update the last_visited_at column for each user? >> >> Is this best achieved by storing this table outside of Hive in a >> traditional RDBMS? Using JDBC query Hive for a list of distinct visitors >> today and based on that list update the 'external' table. >> >> Saurabh. >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com >> >> > -- Peter N. Skomoroch 617.285.8348 http://www.datawrangling.com http://delicious.com/pskomoroch http://twitter.com/peteskomoroch
