[
https://issues.apache.org/jira/browse/HIVE-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569286#comment-14569286
]
Christian Dietze commented on HIVE-10891:
-----------------------------------------
It seems that the
[SimpleFetchOptimizer|https://github.com/apache/hive/blob/branch-1.1/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java]
acts a little bit to aggressive here. From my understanding of the code
there's a check if the filter only affects columns that are partition keys. In
this case the threshold check is bypassed (see [line 147 of
SimpleFetchOptimizer|https://github.com/apache/hive/blob/branch-1.1/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java#L147]).
In the upper query, we filter on a different column, nevertheless the filter
is bypassed due to [these
lines|https://github.com/apache/hive/blob/branch-1.1/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java#L200]:
{code:java}
if (PartitionPruner.onlyContainsPartnCols(table, pruner)) {
bypassFilter = !pctx.getPrunedPartitions(alias, ts).hasUnknownPartitions();
}
{code}
A workaround seems to be, to put the optimizer on a leash by setting
{code:xml}
<property>
<name>hive.fetch.task.conversion</name>
<value>minimal</value>
</property>
{code}
> Limited fetch on partitioned table can eat up all heap
> ------------------------------------------------------
>
> Key: HIVE-10891
> URL: https://issues.apache.org/jira/browse/HIVE-10891
> Project: Hive
> Issue Type: Bug
> Components: Physical Optimizer
> Affects Versions: 1.1.0
> Reporter: Christoph Lipka
>
> When doing a query like
> {code}
> select *
> from partitioned_table
> where not_the_partition_key_column = "xyz"
> limit 100
> {code}
> it is executed in memory. For all tables except the smallest this behavior
> quickly consumes the complete heap and crashes the server.
> If the limit clause is omitted, a mr-job is started and the query is executed
> without memory issues. One can also work around this problem by extending the
> query to also select the partition_key like
> {code}
> select *
> from partitioned_table a
> where a.not_the_partition_key_column = "xyz"
> and a.partition_key_column = (select b.partition_key_column from
> partitioned_table b)
> limit 100
> {code}
> In this case hive also creates a mr-job.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)