junan chen created HIVE-13404: --------------------------------- Summary: Could not launch hive on spark when hive.stats.fetch.column.stats=true Key: HIVE-13404 URL: https://issues.apache.org/jira/browse/HIVE-13404 Project: Hive Issue Type: Bug Components: CBO, HiveServer2, spark-branch Affects Versions: 1.2.1 Environment: Test environment is HDP2.3 Hive cli 1.2.1 spark 1.4.1 without hadoop Reporter: junan chen Assignee: Vaibhav Gumashta
I'm trying to run hive on spark and I followed this guide(https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started) I used the following command to launch spark application hive --hiveconf hive.root.logger=info,console -f hive.hql my hive.hql is: set spark.home=/usr/lib/spark-1.4.1-without-hadoop; set spark.master=yarn-cluster; set spark.app.name=hive-test; set spark.yarn.jar=hdfs:///lib/spark-1.4.1-without-hadoop/lib/spark-assembly-1.4.1-hadoop2.2.0.jar; set hive.execution.engine=spark; set hive.fetch.task.conversion=none; select * from table where ds='2016-03-30'; I Got org.apache.thrift.transport.TTransportException: 16/04/01 10:55:25 [main]: INFO log.PerfLogger: <PERFLOG method=partition-retrieving from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner> 16/04/01 10:55:25 [main]: INFO log.PerfLogger: </PERFLOG method=partition-retrieving start=1459479325296 end=1459479325808 duration=512 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner> 16/04/01 10:55:26 [main]: WARN metastore.RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect. org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_aggr_stats_for(ThriftHiveMetastore.java:3029) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_aggr_stats_for(ThriftHiveMetastore.java:3016) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAggrColStatsFor(HiveMetaStoreClient.java:2064) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) at $Proxy9.getAggrColStatsFor(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getAggrColStatsFor(Hive.java:3115) at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:251) at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:136) at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:124) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:111) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:192) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10189) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) and I found the following error log in hivemetastore.log 2016-04-01 10:55:26,186 ERROR [pool-3-thread-198]: server.TThreadPoolServer (TThreadPoolServer.java:run(294)) - Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is unset! Struct:AggrStats(colStats:null, partsFound:0) at org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) But when I use beeline to launch spark app or set execution engine to mr, I won't get this error. And I use "hive --debug:swapSuspend,port=8760 --hiveconf hive.root.logger=info,console -f hive.hql". I found that if hive.stats.fetch.column.stats set to false, I will not get this error, So I guess, when hive.stats.fetch.column.stats set to true, org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:251) will try to get AggrStats, and there was no aggregate function in my column, and I got this error finally. Am I right about this error? -- This message was sent by Atlassian JIRA (v6.3.4#6332)