[ 
https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke updated HIVE-3179:
-------------------------------

    Description: 
We found a quite severe issue in the HBase Handler which actually means that 
Hive potentially returns incorrect data if a column has NULL values in HBase 
(which means the cell doesn't even exist)

In HBase Shell:

{noformat}
create 'hive_hbase_test', 'test'
put 'hive_hbase_test', '1', 'test:c1', 'c1-1'
put 'hive_hbase_test', '1', 'test:c2', 'c2-1'
put 'hive_hbase_test', '1', 'test:c3', 'c3-1'
put 'hive_hbase_test', '2', 'test:c1', 'c1-2'
{noformat}

In Hive:

{noformat}
DROP TABLE IF EXISTS hive_hbase_test;
CREATE EXTERNAL TABLE hive_hbase_test (
  id int,
  c1 string,
  c2 string,
  c3 string
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key#s,test:c1#s,test:c2#s,test:c3#s")
TBLPROPERTIES("hbase.table.name" = "hive_hbase_test");

hive> select * from hive_hbase_test;
OK
1       c1-1    c2-1    c3-1
2       c1-2    NULL    NULL

hive> select c1 from hive_hbase_test;
c1-1
c1-2

hive> select c1, c2 from hive_hbase_test;
c1-1    c2-1
c1-2    NULL
{noformat}

So far everything is correct but now:

{noformat}
hive> select c1, c2, c2 from hive_hbase_test;
c1-1    c2-1    c2-1
c1-2    NULL    c2-1
{noformat}

Selecting c2 twice works the first time but the second time we
actually get the value from the previous row.

{noformat}
hive> select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test;
c1-1    c3-1    c2-1    c2-1    c3-1    c3-1    c1-1
c1-2    NULL    NULL    c2-1    c3-1    c3-1    c1-2
{noformat}

We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] 
= true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch 
which surely needs review.

  was:
We found a quite severe issue in the HBase Handler which actually means that 
Hive potentially returns incorrect data if a column has NULL values in HBase 
(which means the cell doesn't even exist)

In HBase Shell:

{noformat}
create 'hive_hbase_test', 'test'
put 'hive_hbase_test', '1', 'test:c1', 'c1-1'
put 'hive_hbase_test', '1', 'test:c2', 'c2-1'
put 'hive_hbase_test', '1', 'test:c3', 'c3-1'
put 'hive_hbase_test', '2', 'test:c1', 'c1-2'
{noformat}

In Hive:

{noformat}
DROP TABLE IF EXISTS hive_hbase_test;
CREATE EXTERNAL TABLE hive_hbase_test (
  id int,
  c1 string,
  c2 string,
  c3 string
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key#s,test:c1#s,test:c2#s,test:c3#s")
TBLPROPERTIES("hbase.table.name" = "hive_hbase_test");

hive> select * from hive_hbase_test;
OK
1       c1-1    c2-1    c3-1
2       c1-2    NULL    NULL

hive> select c1 from hive_hbase_test;
c1-1
c1-2

hive> select c1, c2 from hive_hbase_test;
c1-1    c2-1
c1-2    NULL
{noformat}

So far everything is correct but now:

{noformat}
hive> select c1, c2, c2 from hive_hbase_test;
c1-1    c2-1    c2-1
c1-2    NULL    c2-1
{noformat}

Selecting c2 twice works the first time but the second time we
actually get the value from the previous row.

{noformat}
hive> select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test;
c1-1    c3-1    c2-1    c2-1    c3-1    c3-1    c1-1
c1-2    NULL    NULL    c2-1    c3-1    c3-1    c1-2
{noformat}

We've narrowed this down to an early initialization of {{fieldsInited[fieldID] 
= true;}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a 
patch which surely needs review.

    
> HBase Handler doesn't handle NULLs properly
> -------------------------------------------
>
>                 Key: HIVE-3179
>                 URL: https://issues.apache.org/jira/browse/HIVE-3179
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>    Affects Versions: 0.9.0
>            Reporter: Lars Francke
>            Priority: Critical
>
> We found a quite severe issue in the HBase Handler which actually means that 
> Hive potentially returns incorrect data if a column has NULL values in HBase 
> (which means the cell doesn't even exist)
> In HBase Shell:
> {noformat}
> create 'hive_hbase_test', 'test'
> put 'hive_hbase_test', '1', 'test:c1', 'c1-1'
> put 'hive_hbase_test', '1', 'test:c2', 'c2-1'
> put 'hive_hbase_test', '1', 'test:c3', 'c3-1'
> put 'hive_hbase_test', '2', 'test:c1', 'c1-2'
> {noformat}
> In Hive:
> {noformat}
> DROP TABLE IF EXISTS hive_hbase_test;
> CREATE EXTERNAL TABLE hive_hbase_test (
>   id int,
>   c1 string,
>   c2 string,
>   c3 string
> )
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" =
> ":key#s,test:c1#s,test:c2#s,test:c3#s")
> TBLPROPERTIES("hbase.table.name" = "hive_hbase_test");
> hive> select * from hive_hbase_test;
> OK
> 1     c1-1    c2-1    c3-1
> 2     c1-2    NULL    NULL
> hive> select c1 from hive_hbase_test;
> c1-1
> c1-2
> hive> select c1, c2 from hive_hbase_test;
> c1-1  c2-1
> c1-2  NULL
> {noformat}
> So far everything is correct but now:
> {noformat}
> hive> select c1, c2, c2 from hive_hbase_test;
> c1-1  c2-1    c2-1
> c1-2  NULL    c2-1
> {noformat}
> Selecting c2 twice works the first time but the second time we
> actually get the value from the previous row.
> {noformat}
> hive> select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test;
> c1-1  c3-1    c2-1    c2-1    c3-1    c3-1    c1-1
> c1-2  NULL    NULL    c2-1    c3-1    c3-1    c1-2
> {noformat}
> We've narrowed this down to an early initialization of 
> {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and 
> we'll try to provide a patch which surely needs review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to