Marton Bod created HIVE-24133: --------------------------------- Summary: Hive query with Hbase storagehandler can give back incorrect results when predicate contains null check Key: HIVE-24133 URL: https://issues.apache.org/jira/browse/HIVE-24133 Project: Hive Issue Type: Bug Reporter: Marton Bod
It has been observed that when using Hbase storage handler and the table contains null values, Hive can give back wrong query results, depending on what columns we select for and whether the where clause predicate contains any null checks. For example: create 'default:hive_test', 'cf' put 'default:hive_test', '1', 'cf:col1', 'val1' put 'default:hive_test', '1', 'cf:col2', 'val2' put 'default:hive_test', '2', 'cf:col1', 'val1_2' put 'default:hive_test', '2', 'cf:col2', 'val2_2' put 'default:hive_test', '3', 'cf:col1', 'val1_3' put 'default:hive_test', '3', 'cf:col2', 'val2_3' put 'default:hive_test', '3', 'cf:col3', 'val3_3' put 'default:hive_test', '3', 'cf:col4', "\x00\x00\x00\x00\x00\x02\xC2" put 'default:hive_test', '4', 'cf:col1', 'val1_4' put 'default:hive_test', '4', 'cf:col2', 'val2_4' scan 'default:hive_test' ===== HIVE CREATE EXTERNAL TABLE hbase_hive_test ( rowkey string, col1 string, col2 string, col3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,cf:col1,cf:col2,cf:col3" ) TBLPROPERTIES("hbase.table.name" = "default:hive_test"); query: select * from hbase_hive_test where col3 is null; result: Total MapReduce CPU Time Spent: 10 seconds 980 msec OK 1 val1 val2 NULL 2 val1_2 val2_2 NULL 4 val1_4 val2_4 NULL query: select rowkey from hbase_hive_test where col3 is null; This does not produce any records. However, select rowkey, col2 from hbase_hive_test where col3 is null; This gives back the correct results again. -- This message was sent by Atlassian Jira (v8.3.4#803005)