[
https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573987#comment-13573987
]
Brock Noland commented on HIVE-3179:
------------------------------------
I have verified this is an issue with trunk, the patch applies, and the patch
addresses the issue.
{noformat}
hive> select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201302071609_0002, Tracking URL =
http://localhost:50030/jobdetails.jsp?jobid=job_201302071609_0002
Kill Command = /opt/local/hadoop-1.1.1/libexec/../bin/hadoop job -kill
job_201302071609_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-02-07 16:10:31,826 Stage-1 map = 0%, reduce = 0%
2013-02-07 16:10:34,846 Stage-1 map = 100%, reduce = 0%
2013-02-07 16:10:36,861 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201302071609_0002
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 260 HDFS Write: 60 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
c1-1 c3-1 c2-1 c2-1 c3-1 c3-1 c1-1
c1-2 NULL NULL NULL NULL NULL c1-2
Time taken: 10.702 seconds, Fetched: 2 row(s)
hive>
{noformat}
> HBase Handler doesn't handle NULLs properly
> -------------------------------------------
>
> Key: HIVE-3179
> URL: https://issues.apache.org/jira/browse/HIVE-3179
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Affects Versions: 0.9.0
> Reporter: Lars Francke
> Priority: Critical
> Attachments: HIVE-3179.1.patch
>
>
> We found a quite severe issue in the HBase Handler which actually means that
> Hive potentially returns incorrect data if a column has NULL values in HBase
> (which means the cell doesn't even exist)
> In HBase Shell:
> {noformat}
> create 'hive_hbase_test', 'test'
> put 'hive_hbase_test', '1', 'test:c1', 'c1-1'
> put 'hive_hbase_test', '1', 'test:c2', 'c2-1'
> put 'hive_hbase_test', '1', 'test:c3', 'c3-1'
> put 'hive_hbase_test', '2', 'test:c1', 'c1-2'
> {noformat}
> In Hive:
> {noformat}
> DROP TABLE IF EXISTS hive_hbase_test;
> CREATE EXTERNAL TABLE hive_hbase_test (
> id int,
> c1 string,
> c2 string,
> c3 string
> )
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" =
> ":key#s,test:c1#s,test:c2#s,test:c3#s")
> TBLPROPERTIES("hbase.table.name" = "hive_hbase_test");
> hive> select * from hive_hbase_test;
> OK
> 1 c1-1 c2-1 c3-1
> 2 c1-2 NULL NULL
> hive> select c1 from hive_hbase_test;
> c1-1
> c1-2
> hive> select c1, c2 from hive_hbase_test;
> c1-1 c2-1
> c1-2 NULL
> {noformat}
> So far everything is correct but now:
> {noformat}
> hive> select c1, c2, c2 from hive_hbase_test;
> c1-1 c2-1 c2-1
> c1-2 NULL c2-1
> {noformat}
> Selecting c2 twice works the first time but the second time we
> actually get the value from the previous row.
> {noformat}
> hive> select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test;
> c1-1 c3-1 c2-1 c2-1 c3-1 c3-1 c1-1
> c1-2 NULL NULL c2-1 c3-1 c3-1 c1-2
> {noformat}
> We've narrowed this down to an early initialization of
> {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and
> we'll try to provide a patch which surely needs review.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira