[
https://issues.apache.org/jira/browse/HIVE-24833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Mollitor updated HIVE-24833:
----------------------------------
Description:
I believe that a Hive query with an HBase Storage Handler is incorrectly
applies a predicate pushdown into the storage handler.
I observed a FETCH optimization that took a long time to complete because it
was performing a table scan across the entire HBase table.
The only case in which a predicate should be pushed down the storage layer is
for
{code:sql}
SELECT * FROM TABLE my_hbase_table WHERE row_key=?
{code}
This would be appropriate (EQ on the row key). Anything else will involve a
scan of the table and there is no way to easily calculate how small a scan it
will require and therefore should always be passed to the compute engine (Tez).
was:
I believe that a Hive query with an HBase Storage Handler is incorrectly
applies a predicate pushdown into the storage handler.
I observed a FETCH optimization that took a long time to complete because it
was performing a table scan across the entire HBase table.
The only case in which a predicate should be pushed down the storage layer is
for
`SELECT * FROM TABLE my_hbase_table WHERE row_key=?`
This would be appropriate (EQ on the row key). Anything else will involve a
scan of the table and there is no way to easily calculate how small a scan it
will require and therefore should always be passed to the compute engine (Tez).
> Hive Should Only Pushdown EQ Predicate on HBaseStorageHandler
> -------------------------------------------------------------
>
> Key: HIVE-24833
> URL: https://issues.apache.org/jira/browse/HIVE-24833
> Project: Hive
> Issue Type: Improvement
> Reporter: David Mollitor
> Priority: Major
>
> I believe that a Hive query with an HBase Storage Handler is incorrectly
> applies a predicate pushdown into the storage handler.
> I observed a FETCH optimization that took a long time to complete because it
> was performing a table scan across the entire HBase table.
> The only case in which a predicate should be pushed down the storage layer is
> for
> {code:sql}
> SELECT * FROM TABLE my_hbase_table WHERE row_key=?
> {code}
> This would be appropriate (EQ on the row key). Anything else will involve a
> scan of the table and there is no way to easily calculate how small a scan it
> will require and therefore should always be passed to the compute engine
> (Tez).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)