[ 
https://issues.apache.org/jira/browse/HIVE-24544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabien Carrion updated HIVE-24544:
----------------------------------
    Description: 
When I was trying to apply a timestamp filter, I get the wrong data.

On a table like this

CREATE EXTERNAL TABLE t1 (key string, v string, ts timestamp) STORED BY 
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES 
("hbase.columns.mapping" = ":key,cf:v,:timestamp") TBLPROPERTIES 
("hbase.table.name" = "t1", "hbase.table.default.storage.type" = "binary", 
"external.table.purge" = "false");

A request such as:

select key, ts from t1 where ts >= '2020-12-01 00:00:00' and ts < '2020-12-02 
00:00:00';

returns values with ts < '2020-12-01 00:00:00'

After investigation, it looks like the timestamp filter is never used in the 
HiveHBaseTableInputFormat.getRecordReader method, which is used to create the 
actual mapreduce job.

But it used in the HiveHBaseTableInputFormat.getSplitsInternal method, which is 
used to create the mappings tasks.

So I copy the code from the second method in the first.

I attached a small patch. That's a little hacky and I am not sure I respect the 
philosophy of the component. But it works.

 

  was:
When I was trying to apply a timestamp filter, I get the wrong data. A request 
such as:

```

select key, ts from t1 where ts >= '2020-12-01 00:00:00' and ts < '2020-12-02 
00:00:00';

```

returns value with ts < '2020-12-01 00:00:00'

After investigation, it looks like the timestamp filter is never used in the 
HiveHBaseTableInputFormat.getRecordReader method, which is used to create the 
actual mapreduce job.

But it used in the HiveHBaseTableInputFormat.getSplitsInternal method, which is 
used to create the mappings tasks.

So I copy the code from the second method in the first.

I attached a small patch. That's a little hacky and I am not sure I respect the 
philosophy of the component. But it works.

 


> HBase Timestamp filter never gets converted to a timerange filter
> -----------------------------------------------------------------
>
>                 Key: HIVE-24544
>                 URL: https://issues.apache.org/jira/browse/HIVE-24544
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: Fabien Carrion
>            Priority: Minor
>         Attachments: timerange.patch
>
>
> When I was trying to apply a timestamp filter, I get the wrong data.
> On a table like this
> CREATE EXTERNAL TABLE t1 (key string, v string, ts timestamp) STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES 
> ("hbase.columns.mapping" = ":key,cf:v,:timestamp") TBLPROPERTIES 
> ("hbase.table.name" = "t1", "hbase.table.default.storage.type" = "binary", 
> "external.table.purge" = "false");
> A request such as:
> select key, ts from t1 where ts >= '2020-12-01 00:00:00' and ts < '2020-12-02 
> 00:00:00';
> returns values with ts < '2020-12-01 00:00:00'
> After investigation, it looks like the timestamp filter is never used in the 
> HiveHBaseTableInputFormat.getRecordReader method, which is used to create the 
> actual mapreduce job.
> But it used in the HiveHBaseTableInputFormat.getSplitsInternal method, which 
> is used to create the mappings tasks.
> So I copy the code from the second method in the first.
> I attached a small patch. That's a little hacky and I am not sure I respect 
> the philosophy of the component. But it works.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to