[
https://issues.apache.org/jira/browse/HIVE-16412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970250#comment-15970250
]
Amir Shenavandeh commented on HIVE-16412:
-----------------------------------------
An empty PruneExpression. is passed:
2017-04-16T00:59:35,372 TRACE [main] ppr.PartitionPruner: Started pruning
partiton
2017-04-16T00:59:35,372 TRACE [main] ppr.PartitionPruner: dbname = default
2017-04-16T00:59:35,372 TRACE [main] ppr.PartitionPruner: tabname =
ext_data_part
2017-04-16T00:59:35,372 TRACE [main] ppr.PartitionPruner: prune Expression =
> Hive on Tez incorrect partition pruning ANALYZE TABLE
> -----------------------------------------------------
>
> Key: HIVE-16412
> URL: https://issues.apache.org/jira/browse/HIVE-16412
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 2.1.1
> Environment: Hadoop2.7.3, Hive 2.1.1, Tez 0.8.5
> Reporter: Amir Shenavandeh
> Labels: Tez, hive, partition_pruner
>
> Hive on Tez, on partitioned tables ANALYZE TABLE T PARTITION (...) COMPUTE
> STATISTICS; will gather stats for all partitions from metastore even though
> partition spec only chooses a subset. Hive on MR runs efficiently.
> For example:
> ---
> analyze table ext_data_part partition(a=9957) compute statistics noscan
> ---
> Will cause:
> ---
> 2017-04-09T22:25:30,332 DEBUG [main] metastore.MetaStoreDirectSql: Direct SQL
> query in 12.30189ms + 0.037891ms, the query is [select "PARTITIONS"."PART_ID"
> from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" =
> "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join "DBS" on
> "TBLS"."DB_ID" = "DBS"."DB_ID"
> and "DBS"."NAME" = ? ]
> ---
> And:
> 2017-03-02T16:54:08,104 DEBUG [main([])]: log.PerfLogger (:()) - </PERFLOG
> method=TezCompiler start=1488473648104 end=1488473648104 duration=0
> from=org.apache.hadoop.hive.ql.parse.TezCompiler Setup dynamic partition
> pruning>
> 2017-03-02T16:54:08,104 DEBUG [main([])]: log.PerfLogger (:()) - <PERFLOG
> method=TezCompiler from=org.apache.hadoop.hive.ql.parse.TezCompiler>
> 2017-03-02T16:54:08,110 DEBUG [main([])]: log.PerfLogger (:()) - <PERFLOG
> method=partition-retrieving
> from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
> 2017-03-02T16:54:08,153 DEBUG [main([])]: log.PerfLogger (:()) - </PERFLOG
> method=partition-retrieving start=1488473648110 end=1488473648153 duration=43
> from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
> ---
> The stackTrace:
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2265)
> - locked <0x00000003de3798f0> (a
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler)
> at com.sun.proxy.$Proxy21.listPartitions(Unknown Source)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.getAllPartitionsOf(Hive.java:2301)
> at
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getAllPartitions(PartitionPruner.java:454)
> at
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getAllPartsFromCacheOrServer(PartitionPruner.java:236)
> at
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:195)
> at
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:144)
> at
> org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:511)
> at
> org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:504)
> at
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:121)
> at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
> at
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
> at
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
> at
> org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:259)
> at
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:128)
> at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:134)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10947)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10526)
> at
> org.apache.hadoop.hive.ql.parse.ColumnStatsSemanticAnalyzer.analyze(ColumnStatsSemanticAnalyzer.java:385)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)