[ 
https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721391#comment-14721391
 ] 

Swarnim Kulkarni commented on HIVE-11609:
-----------------------------------------

Here are results from my testing with and without this patch applied. The table 
"my_table" for this testing contains about 8 M rows.

*Restrict query by single key*:

Example query: select * from my_table where key.firstpart="something";

|| Memory(in MB) || With patch || Without patch ||
| 1500 | Out of memory | Out of memory |
| 3000 | 2.5 minutes | Out of memory |
| 6000 | 2.4 minutes | 23 minutes |

*Restrict query by multiple key*: (Note that the key parts must be successive 
for this to work)

Example query: select * from my_table where key.firstpart="something" and 
key.secondpart="something2";

|| Memory(in MB) || With filter || Without filter ||
| 1500 | 23 sec | Out of memory |
| 3000 | 19 sec | Out of memory |
| 6000 | 18.8 sec | 24 minutes |

So we restrict our filter and get more efficient depending as we get more 
detailed and deeper with the query. To toggle between using filter and not 
using it, I set the hive.optimize.ppd.storage flag to false so no predicate 
pushdown happens.

Finally query without M/R job:

*Restrict query by multiple key*: (No M/R job)

Example query: select * from my_table where key.firstpart="something" and 
key.secondpart="something2";

|| Memory(in MB) || With filter || Without filter ||
| 3000 | 5 sec | 19 minutes |

> Capability to add a filter to hbase scan via composite key doesn't work
> -----------------------------------------------------------------------
>
>                 Key: HIVE-11609
>                 URL: https://issues.apache.org/jira/browse/HIVE-11609
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: Swarnim Kulkarni
>            Assignee: Swarnim Kulkarni
>         Attachments: HIVE-11609.1.patch.txt
>
>
> It seems like the capability to add filter to an hbase scan which was added 
> as part of HIVE-6411 doesn't work. This is primarily because in the 
> HiveHBaseInputFormat, the filter is added in the getsplits instead of 
> getrecordreader. This works fine for start and stop keys but not for filter 
> because a filter is respected only when an actual scan is performed. This is 
> also related to the initial refactoring that was done as part of HIVE-3420.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to