Andrew Mains created HIVE-10545:
-----------------------------------

             Summary: Implement predicate pushdown for queries over HBase 
snapshots
                 Key: HIVE-10545
                 URL: https://issues.apache.org/jira/browse/HIVE-10545
             Project: Hive
          Issue Type: Improvement
          Components: HBase Handler
            Reporter: Andrew Mains


Hive's hbase integration currently supports queries over HBase snapshots, and 
predicate pushdown for queries over HBase tables, but doesn't currently support 
predicate pushdown for queries over HBase snapshots. This seems to be largely 
due to the fact that the hbase handler uses the `mapred` 
TableSnapshotInputFormat implementation, which doesn't support pushing a scan 
to the job, and not the `mapreduce` implementation, which does (see 
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapred/TableMapReduceUtil.html#initTableSnapshotMapJob(java.lang.String,%20java.lang.String,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class,%20org.apache.hadoop.mapred.JobConf,%20boolean,%20org.apache.hadoop.fs.Path
 vs 
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html#initTableSnapshotMapperJob(java.lang.String,%20org.apache.hadoop.hbase.client.Scan,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class,%20org.apache.hadoop.mapreduce.Job,%20boolean,%20org.apache.hadoop.fs.Path))
 .

Hive should be able to switch to the mapreduce implementation (performing the 
necessary shimming between mapred and mapreduce), and thus gain the ability to 
push predicates down to the input format in the same way as is done with 
HiveTableInputFormat. This switch should result in significant performance 
improvements for queries which specify range/equality conditions on the row key 
(which seems like it would be a reasonably common case). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to