[ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928334#action_12928334 ]
John Sichi commented on HIVE-1634: ---------------------------------- OK, I finally got some time to look into the Lazy* classes. I see what you mean about the class hierarchy, and I agree that we can leave any refactoring of the existing classes for a followup patch. Also, I was wrong to think that we could reuse the existing binary classes, since they do things such as VInt zero-compression, and that's incompatible with the HBase Bytes format. However, for this patch, I want to at least get the new classes into their final destination with respect to package name and class name (so that we don't have to move them later, even if we adjust their inheritance). To this end, I suggest a new package serde2.lazydio, and name the classes on the pattern LazyDioInteger. The "Dio" is to indicate DataInput/DataOutput format. (I was thinking of lazybytes and LazyByteInteger, to indicate HBase Bytes format, but then I saw that Byte is also one of the datatypes, and LazyBytesByte would be puzzling.) Having both LazyIntegerBinary and LazyBinaryInteger, as in the current patch, would just be too confusing. Also, regarding the implementation of the new classes, most of the init method code is duplicated from class to class. The only thing specific to each class is the actual read+set. Should we factor out a LazyDioObject (similar to the existing pattern for LazyObject and LazyBinaryObject)? Likewise for LazyDioPrimitive and LazyDioNonPrimitive. I will ask some others to chime in on this as well. > Allow access to Primitive types stored in binary format in HBase > ---------------------------------------------------------------- > > Key: HIVE-1634 > URL: https://issues.apache.org/jira/browse/HIVE-1634 > Project: Hive > Issue Type: Improvement > Components: HBase Handler > Affects Versions: 0.7.0 > Reporter: Basab Maulik > Assignee: Basab Maulik > Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java > > > This addresses HIVE-1245 in part, for atomic or primitive types. > The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a > specification of the storage option for the corresponding column in the serde > property "hbase.columns.mapping". Allowed values are '-' for table default, > 's' for standard string storage, and 'b' for binary storage as would be > obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families > use a colon separated pair such as 's:b' for the key and value part > specifiers respectively. See the test cases and queries for HBase handler for > additional examples. > There is also a table property "hbase.table.default.storage.type" = "string" > to specify a table level default storage type. The other valid specification > is "binary". The table level default is overridden by a column level > specification. > This control is available for the boolean, tinyint, smallint, int, bigint, > float, and double primitive types. The attached patch also relaxes the > mapping of map types to HBase column families to allow any primitive type to > be the map key. > Attached is a program for creating a table and populating it in HBase. The > external table in Hive can access the data as shown in the example below. > hive> create external table TestHiveHBaseExternalTable > > (key string, c_bool boolean, c_byte tinyint, c_short smallint, > > c_int int, c_long bigint, c_string string, c_float float, c_double > double) > > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > > with serdeproperties ("hbase.columns.mapping" = > ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double") > > tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable"); > OK > Time taken: 0.691 seconds > hive> select * from TestHiveHBaseExternalTable; > OK > key-1 NULL NULL NULL NULL NULL Test-String NULL NULL > Time taken: 0.346 seconds > hive> drop table TestHiveHBaseExternalTable; > OK > Time taken: 0.139 seconds > hive> create external table TestHiveHBaseExternalTable > > (key string, c_bool boolean, c_byte tinyint, c_short smallint, > > c_int int, c_long bigint, c_string string, c_float float, c_double > double) > > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > > with serdeproperties ( > > "hbase.columns.mapping" = > ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double", > > "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" ) > > tblproperties ( > > "hbase.table.name" = "TestHiveHBaseExternalTable", > > "hbase.table.default.storage.type" = "string"); > OK > Time taken: 0.139 seconds > hive> select * from TestHiveHBaseExternalTable; > OK > key-1 true -128 -32768 -2147483648 -9223372036854775808 > Test-String -2.1793132E-11 2.01345E291 > Time taken: 0.151 seconds > hive> drop table TestHiveHBaseExternalTable; > OK > Time taken: 0.154 seconds > hive> create external table TestHiveHBaseExternalTable > > (key string, c_bool boolean, c_byte tinyint, c_short smallint, > > c_int int, c_long bigint, c_string string, c_float float, c_double > double) > > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > > with serdeproperties ( > > "hbase.columns.mapping" = > ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double", > > "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" ) > > tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable"); > OK > Time taken: 0.347 seconds > hive> select * from TestHiveHBaseExternalTable; > OK > key-1 true -128 -32768 -2147483648 -9223372036854775808 > Test-String -2.1793132E-11 2.01345E291 > Time taken: 0.245 seconds > hive> -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.