[
https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721391#comment-14721391
]
Swarnim Kulkarni commented on HIVE-11609:
-----------------------------------------
Here are results from my testing with and without this patch applied. The table
"my_table" for this testing contains about 8 M rows.
*Restrict query by single key*:
Example query: select * from my_table where key.firstpart="something";
|| Memory(in MB) || With patch || Without patch ||
| 1500 | Out of memory | Out of memory |
| 3000 | 2.5 minutes | Out of memory |
| 6000 | 2.4 minutes | 23 minutes |
*Restrict query by multiple key*: (Note that the key parts must be successive
for this to work)
Example query: select * from my_table where key.firstpart="something" and
key.secondpart="something2";
|| Memory(in MB) || With filter || Without filter ||
| 1500 | 23 sec | Out of memory |
| 3000 | 19 sec | Out of memory |
| 6000 | 18.8 sec | 24 minutes |
So we restrict our filter and get more efficient depending as we get more
detailed and deeper with the query. To toggle between using filter and not
using it, I set the hive.optimize.ppd.storage flag to false so no predicate
pushdown happens.
Finally query without M/R job:
*Restrict query by multiple key*: (No M/R job)
Example query: select * from my_table where key.firstpart="something" and
key.secondpart="something2";
|| Memory(in MB) || With filter || Without filter ||
| 3000 | 5 sec | 19 minutes |
> Capability to add a filter to hbase scan via composite key doesn't work
> -----------------------------------------------------------------------
>
> Key: HIVE-11609
> URL: https://issues.apache.org/jira/browse/HIVE-11609
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Reporter: Swarnim Kulkarni
> Assignee: Swarnim Kulkarni
> Attachments: HIVE-11609.1.patch.txt
>
>
> It seems like the capability to add filter to an hbase scan which was added
> as part of HIVE-6411 doesn't work. This is primarily because in the
> HiveHBaseInputFormat, the filter is added in the getsplits instead of
> getrecordreader. This works fine for start and stop keys but not for filter
> because a filter is respected only when an actual scan is performed. This is
> also related to the initial refactoring that was done as part of HIVE-3420.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)