[ 
https://issues.apache.org/jira/browse/HIVE-24133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193453#comment-17193453
 ] 

zhishui commented on HIVE-24133:
--------------------------------

This is caused by scan of hbase which only scan CF:COL of what your sql need.

In method of HiveHBaseInputFormatUtil.getScan, there only add necessary col id 
to scan and it's may have some trouble about the scenario you provide, but you 
could not be critical about this. However, we could not always scan all columns.

The basic difference is that how to understand between row-based storage (hive) 
and column-based storage(hbase)

> Hive query with Hbase storagehandler can give back incorrect results when 
> predicate contains null check
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-24133
>                 URL: https://issues.apache.org/jira/browse/HIVE-24133
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Marton Bod
>            Assignee: zhishui
>            Priority: Major
>
> It has been observed that when using Hbase storage handler and the table 
> contains null values, Hive can give back wrong query results, depending on 
> what columns we select for and whether the where clause predicate contains 
> any null checks.
> For example:
> create 'default:hive_test', 'cf'
> put 'default:hive_test', '1', 'cf:col1', 'val1'
> put 'default:hive_test', '1', 'cf:col2', 'val2'
> put 'default:hive_test', '2', 'cf:col1', 'val1_2'
> put 'default:hive_test', '2', 'cf:col2', 'val2_2'
> put 'default:hive_test', '3', 'cf:col1', 'val1_3'
> put 'default:hive_test', '3', 'cf:col2', 'val2_3'
> put 'default:hive_test', '3', 'cf:col3', 'val3_3'
> put 'default:hive_test', '3', 'cf:col4', "\x00\x00\x00\x00\x00\x02\xC2"
> put 'default:hive_test', '4', 'cf:col1', 'val1_4'
> put 'default:hive_test', '4', 'cf:col2', 'val2_4'
> scan 'default:hive_test'
> ===== HIVE
> CREATE EXTERNAL TABLE hbase_hive_test (
> rowkey string,
> col1 string,
> col2 string,
> col3 string
> )
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:col1,cf:col2,cf:col3"
> )
> TBLPROPERTIES("hbase.table.name" = "default:hive_test");
> query: select * from hbase_hive_test where col3 is null;
> result:
> Total MapReduce CPU Time Spent: 10 seconds 980 msec
> OK
> 1 val1 val2 NULL
> 2 val1_2 val2_2 NULL
> 4 val1_4 val2_4 NULL
> query: select rowkey from hbase_hive_test where col3 is null;
> This does not produce any records.
> However, select rowkey, col2 from hbase_hive_test where col3 is null;
> This gives back the correct results again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to