Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "Hive/HBaseIntegration" page has been changed by JohnSichi. http://wiki.apache.org/hadoop/Hive/HBaseIntegration?action=diff&rev1=28&rev2=29 -------------------------------------------------- {{{ CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' - WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val") + WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz"); }}} @@ -140, +140 @@ * for each Hive column, the table creator must specify a corresponding entry in the comma-delimited {{{hbase.columns.mapping}}} string (so for a Hive table with n columns, the string should have n entries); whitespace should '''not''' be used in between entries since these will be interperted as part of the column name, which is almost certainly not what you want * a mapping entry must be either {{{:key}}} or of the form {{{column-family-name:[column-name]}}} * there must be exactly one {{{:key}}} mapping (we don't support compound keys yet) - ** note that before HIVE-1228, {{{:key}}} was not supported, and the first Hive column implicitly mapped to the key; as of HIVE-1228, it is now strongly recommended that you always specify the key explictly; we will drop support for implicit key mapping in the future + * (note that before HIVE-1228, {{{:key}}} was not supported, and the first Hive column implicitly mapped to the key; as of HIVE-1228, it is now strongly recommended that you always specify the key explictly; we will drop support for implicit key mapping in the future) * if no column-name is given, then the Hive column will map to all columns in the corresponding HBase column family, and the Hive MAP datatype must be used to allow access to these (possibly sparse) columns * there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp. * since HBase does not associate datatype information with columns, the serde converts everything to string representation before storing it in HBase; there is currently no way to plug in a custom serde per column @@ -210, +210 @@ correspond to the map values. {{{ - CREATE TABLE hbase_table_1(key int, value map<string,int>) + CREATE TABLE hbase_table_1(value map<string,int>, row_key int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( - "hbase.columns.mapping" = ":key,cf:" + "hbase.columns.mapping" = "cf:,:key" ); - INSERT OVERWRITE TABLE hbase_table_1 SELECT foo, map(bar, foo) FROM pokes + INSERT OVERWRITE TABLE hbase_table_1 SELECT map(bar, foo), foo FROM pokes WHERE foo=98 OR foo=100; }}} + + (This example also demonstrates using a Hive column other than the first as the HBase row key.) Here's how this looks in HBase (with different column names in different rows): @@ -237, +239 @@ Launching Job 1 out of 1 ... OK - 100 {"val_100":100} + {"val_100":100} 100 - 98 {"val_98":98} + {"val_98":98} 98 Time taken: 3.808 seconds }}}
