[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878388#comment-13878388 ]
Hive QA commented on HIVE-6157: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12624267/HIVE-6157.01.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 4943 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/980/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/980/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12624267 > Fetching column stats slower than the 101 during rush hour > ---------------------------------------------------------- > > Key: HIVE-6157 > URL: https://issues.apache.org/jira/browse/HIVE-6157 > Project: Hive > Issue Type: Bug > Affects Versions: 0.13.0 > Reporter: Gunther Hagleitner > Assignee: Sergey Shelukhin > Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, > HIVE-6157.nogen.patch, HIVE-6157.prelim.patch > > > "hive.stats.fetch.column.stats" controls whether the column stats for a table > are fetched during explain (in Tez: during query planning). On my setup (1 > table 4000 partitions, 24 columns) the time spent in semantic analyze goes > from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent > fetching column stats... > The reason is probably that the APIs force you to make separate metastore > calls for each column in each partition. That's probably the first thing that > has to change. The question is if in addition to that we need to cache this > in the client or store the stats as a single blob in the database to further > cut down on the time. However, the way it stands right now column stats seem > unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)