[
https://issues.apache.org/jira/browse/HIVE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641834#comment-14641834
]
Swarnim Kulkarni commented on HIVE-11327:
-----------------------------------------
[~yzuehlke] Thanks for logging this. This is expected behavior. The support for
predicate pushdown for simple delimited composite keys is not yet there in
hive. One solution is to instead treat your keys as a complex composite key and
provide a custom implementation for that. In that way, you should be able to
take advantage of the hbase filters to make your queries run much faster.
Please refer to the documentation here for further details[1]
[1]
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-ComplexCompositeRowKeysandHBaseKeyFactory
> HiveQL to HBase - Predicate Pushdown for composite key not working
> ------------------------------------------------------------------
>
> Key: HIVE-11327
> URL: https://issues.apache.org/jira/browse/HIVE-11327
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler, Hive
> Affects Versions: 0.14.0
> Reporter: Yannik Zuehlke
> Priority: Blocker
>
> I am using Hive 0.14 and Hbase 0.98.8 I would like to use HiveQL for
> accessing a HBase "table".
> I created a table with a complex composite rowkey:
> ----
> {quote}
> CREATE EXTERNAL TABLE db.hive_hbase (rowkey struct<p1:string, p2:string,
> p3:string>, column1 string, column2 string)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> COLLECTION ITEMS TERMINATED BY ';'
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" =
> ":key,cf:c1,cf:c2")
> TBLPROPERTIES("hbase.table.name"="hbase_table");
> {quote}
> ----
> The table is getting successfully created, but the HiveQL query is taking
> forever:
> ----
> {quote}
> SELECT * from db.hive_hbase WHERE rowkey.p1 = 'xyz';
> {quote}
> ----
> I am working with 1 TB of data (around 1,5 bn records) and this queries takes
> forever (It ran over night, but did not finish in the morning).
> I changed the log4j properties to 'DEBUG' and found some interesting
> information:
> ----
> {quote}
> 2015-07-15 15:56:41,232 INFO ppd.OpProcFactory
> (OpProcFactory.java:logExpr(823)) - Pushdown Predicates of FIL For Alias :
> hive_hbase
> 2015-07-15 15:56:41,232 INFO ppd.OpProcFactory
> (OpProcFactory.java:logExpr(826)) - (rowkey.p1 = 'xyz')
> {quote}
> ----
> But some lines later:
> ----
> {quote}
> 2015-07-15 15:56:41,430 DEBUG ppd.OpProcFactory
> (OpProcFactory.java:pushFilterToStorageHandler(1051)) - No pushdown possible
> for predicate: (rowkey.p1 = 'xyz')
> {quote}
> ----
> So my guess is: HiveQL over HBase does not do any predicate pushdown but
> starts a MapReduce job.
> The normal HBase scan (via the HBase Shell) takes around 5 seconds.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)