It's the usual tradeoff. One approach is ETL (pump the data from HBase into Hive and then analyze it there). The benefit is that once the data is in Hive, queries against it will typically run faster (since Hive is optimized for warehousing). The drawback is staleness: you won't be querying the very latest data.
The other approach is direct queries against the latest data in HBase: up-to-date data, but slower query performance (and adding load to your HBase cluster). You may consider using both approaches: do ETL, and for most queries, run against the Hive data, but when you need the latest, hit HBase. JVS ________________________________________ From: SingoWong [[email protected]] Sent: Wednesday, May 19, 2010 2:27 AM To: [email protected] Subject: How to use Hive for HBase Hi, I got a confused for Hive and HBase. HBase to be a database, and Hive to be a warehouse, if i wanna wanna to statistics and analysis the data from warehouse, and my source data is put on HBase, so, should i move my data from HBase to Hive? Thanks & Regards, Singo
