[Hadoop Wiki] Update of "Hive/HBaseIntegration" by John Sichi

Apache Wiki Fri, 04 Jun 2010 14:23:17 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hive/HBaseIntegration" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/HBaseIntegration?action=diff&rev1=28&rev2=29

--------------------------------------------------

  {{{
  CREATE TABLE hbase_table_1(key int, value string) 
  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
- WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val")
+ WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
  TBLPROPERTIES ("hbase.table.name" = "xyz");
  }}}
  
@@ -140, +140 @@

   * for each Hive column, the table creator must specify a corresponding entry 
in the comma-delimited {{{hbase.columns.mapping}}} string (so for a Hive table 
with n columns, the string should have n entries); whitespace should '''not''' 
be used in between entries since these will be interperted as part of the 
column name, which is almost certainly not what you want
   * a mapping entry must be either {{{:key}}} or of the form 
{{{column-family-name:[column-name]}}}
   * there must be exactly one {{{:key}}} mapping (we don't support compound 
keys yet)
-  ** note that before HIVE-1228, {{{:key}}} was not supported, and the first 
Hive column implicitly mapped to the key; as of HIVE-1228, it is now strongly 
recommended that you always specify the key explictly; we will drop support for 
implicit key mapping in the future
+  * (note that before HIVE-1228, {{{:key}}} was not supported, and the first 
Hive column implicitly mapped to the key; as of HIVE-1228, it is now strongly 
recommended that you always specify the key explictly; we will drop support for 
implicit key mapping in the future)
   * if no column-name is given, then the Hive column will map to all columns 
in the corresponding HBase column family, and the Hive MAP datatype must be 
used to allow access to these (possibly sparse) columns
   * there is currently no way to access the HBase timestamp attribute, and 
queries always access data with the latest timestamp.
   * since HBase does not associate datatype information with columns, the 
serde converts everything to string representation before storing it in HBase; 
there is currently no way to plug in a custom serde per column
@@ -210, +210 @@

  correspond to the map values.
  
  {{{
- CREATE TABLE hbase_table_1(key int, value map<string,int>) 
+ CREATE TABLE hbase_table_1(value map<string,int>, row_key int) 
  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
  WITH SERDEPROPERTIES (
- "hbase.columns.mapping" = ":key,cf:"
+ "hbase.columns.mapping" = "cf:,:key"
  );
- INSERT OVERWRITE TABLE hbase_table_1 SELECT foo, map(bar, foo) FROM pokes 
+ INSERT OVERWRITE TABLE hbase_table_1 SELECT map(bar, foo), foo FROM pokes 
  WHERE foo=98 OR foo=100;
  }}}
+ 
+ (This example also demonstrates using a Hive column other than the first as 
the HBase row key.)
  
  Here's how this looks in HBase (with different column names in different 
rows):
  
@@ -237, +239 @@

  Launching Job 1 out of 1
  ...
  OK
- 100   {"val_100":100}
+ {"val_100":100}       100
- 98    {"val_98":98}
+ {"val_98":98} 98
  Time taken: 3.808 seconds
  }}}

[Hadoop Wiki] Update of "Hive/HBaseIntegration" by John Sichi

Reply via email to