[ 
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910305#action_12910305
 ] 

John Sichi commented on HIVE-1634:
----------------------------------

Hey Basab,

This is a great start.  Beyond the review comments I added, I do have some 
higher-level suggestions:

* For the column mapping, the reason I suggested "a:b:string" in the original 
JIRA description is that it's a pain to keep everything lined up by column 
position.  It's already less than ideal that we do the column name mapping by 
position, so I don't think we should make it worse by having a separate 
property for type.  Using the s/b shorthand is fine, and if you think that we 
shouldn't overload the colon, we can use a different separator, e.g. "cf:cq#s". 
 Since the existing property name is hbase.columns.mapping, I don't think it 
will be confusing to roll in the (optional) type info as well.

* I'm wondering whether we can just use the existing classes like 
LazyBinaryByte in package org.apache.hadoop.hive.serde2.lazybinary instead of 
creating new ones.  Or are these not compatible with hbase.utils.Bytes?

* For the tests, I noticed that you have attached TestHiveHBaseExternalTable.  
I think it would be a good idea if you can create and populate such a fixture 
table in HBaseTestSetup; that way it can be available (treated as read-only) to 
all of the HBase .q tests.  Otherwise, it's hard to verify that we're 
compatible with a table created directly through HBase API's rather than Hive.

* Also for the tests, it would be good if you can filter it down to only a 
small number of representative rows when pulling the initial test data set from 
the Hive src table.  That way, we can keep the .q.out files smaller.

* Once we get this one committed, be sure to update the wiki.


> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a 
> specification of the storage option for the corresponding column in the serde 
> property "hbase.columns.mapping". Allowed values are '-' for table default, 
> 's' for standard string storage, and 'b' for binary storage as would be 
> obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
> use a colon separated pair such as 's:b' for the key and value part 
> specifiers respectively. See the test cases and queries for HBase handler for 
> additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" 
> to specify a table level default storage type. The other valid specification 
> is "binary". The table level default is overridden by a column level 
> specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, 
> float, and double primitive types. The attached patch also relaxes the 
> mapping of map types to HBase column families to allow any primitive type to 
> be the map key.
> Attached is a program for creating a table and populating it in HBase. The 
> external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double 
> double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = 
> ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1 NULL    NULL    NULL    NULL    NULL    Test-String     NULL    NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double 
> double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = 
> ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1 true    -128    -32768  -2147483648     -9223372036854775808    
> Test-String     -2.1793132E-11  2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double 
> double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = 
> ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1 true    -128    -32768  -2147483648     -9223372036854775808    
> Test-String     -2.1793132E-11  2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to