Currently you need to tell Hive about the column information (what names to use in Hive, and how they map into colfamily:colname in HBase) as part of your CREATE EXTERNAL TABLE statement.
We could support some kind of default mapping in Hive for CREATE EXTERNAL TABLE, but that might not get what you want correctly. Instead, you can write a Java utility to read HBase metadata and construct a CREATE EXTERNAL TABLE string exactly the way you want it. JVS ________________________________________ From: Ray Duong [[email protected]] Sent: Wednesday, May 19, 2010 1:02 PM To: [email protected] Subject: Re: How to use Hive for HBase Hi John, Is there any easy way to dump the HBase data into Hive, (via HBase export) and have Hive read it without knowing all the column qualifier? Thanks, -ray On Wed, May 19, 2010 at 11:10 AM, John Sichi <[email protected]<mailto:[email protected]>> wrote: It's the usual tradeoff. One approach is ETL (pump the data from HBase into Hive and then analyze it there). The benefit is that once the data is in Hive, queries against it will typically run faster (since Hive is optimized for warehousing). The drawback is staleness: you won't be querying the very latest data. The other approach is direct queries against the latest data in HBase: up-to-date data, but slower query performance (and adding load to your HBase cluster). You may consider using both approaches: do ETL, and for most queries, run against the Hive data, but when you need the latest, hit HBase. JVS ________________________________________ From: SingoWong [[email protected]<mailto:[email protected]>] Sent: Wednesday, May 19, 2010 2:27 AM To: [email protected]<mailto:[email protected]> Subject: How to use Hive for HBase Hi, I got a confused for Hive and HBase. HBase to be a database, and Hive to be a warehouse, if i wanna wanna to statistics and analysis the data from warehouse, and my source data is put on HBase, so, should i move my data from HBase to Hive? Thanks & Regards, Singo
