It's the usual tradeoff.  

One approach is ETL (pump the data from HBase into Hive and then analyze it 
there).  The benefit is that once the data is in Hive, queries against it will 
typically run faster (since Hive is optimized for warehousing).  The drawback 
is staleness:  you won't be querying the very latest data.

The other approach is direct queries against the latest data in HBase:  
up-to-date data, but slower query performance (and adding load to your HBase 
cluster).

You may consider using both approaches:  do ETL, and for most queries, run 
against the Hive data, but when you need the latest, hit HBase.

JVS

________________________________________
From: SingoWong [[email protected]]
Sent: Wednesday, May 19, 2010 2:27 AM
To: [email protected]
Subject: How to use Hive for HBase

Hi,

I got a confused for Hive and HBase.
HBase to be a database, and Hive to be a warehouse, if i wanna wanna to 
statistics and analysis the data from warehouse, and my source data is put on 
HBase, so, should i move my data from HBase to Hive?

Thanks & Regards,
Singo

Reply via email to