Marton Bod created HIVE-24133:
---------------------------------

             Summary: Hive query with Hbase storagehandler can give back 
incorrect results when predicate contains null check
                 Key: HIVE-24133
                 URL: https://issues.apache.org/jira/browse/HIVE-24133
             Project: Hive
          Issue Type: Bug
            Reporter: Marton Bod


It has been observed that when using Hbase storage handler and the table 
contains null values, Hive can give back wrong query results, depending on what 
columns we select for and whether the where clause predicate contains any null 
checks.

For example:

create 'default:hive_test', 'cf'
put 'default:hive_test', '1', 'cf:col1', 'val1'
put 'default:hive_test', '1', 'cf:col2', 'val2'

put 'default:hive_test', '2', 'cf:col1', 'val1_2'
put 'default:hive_test', '2', 'cf:col2', 'val2_2'

put 'default:hive_test', '3', 'cf:col1', 'val1_3'
put 'default:hive_test', '3', 'cf:col2', 'val2_3'
put 'default:hive_test', '3', 'cf:col3', 'val3_3'
put 'default:hive_test', '3', 'cf:col4', "\x00\x00\x00\x00\x00\x02\xC2"

put 'default:hive_test', '4', 'cf:col1', 'val1_4'
put 'default:hive_test', '4', 'cf:col2', 'val2_4'

scan 'default:hive_test'

===== HIVE

CREATE EXTERNAL TABLE hbase_hive_test (
rowkey string,
col1 string,
col2 string,
col3 string
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,cf:col1,cf:col2,cf:col3"
)
TBLPROPERTIES("hbase.table.name" = "default:hive_test");

query: select * from hbase_hive_test where col3 is null;

result:
Total MapReduce CPU Time Spent: 10 seconds 980 msec
OK
1 val1 val2 NULL
2 val1_2 val2_2 NULL
4 val1_4 val2_4 NULL

query: select rowkey from hbase_hive_test where col3 is null;

This does not produce any records.

However, select rowkey, col2 from hbase_hive_test where col3 is null;

This gives back the correct results again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to