[
https://issues.apache.org/jira/browse/HIVE-17634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185281#comment-16185281
]
liyunzhang_intel commented on HIVE-17634:
-----------------------------------------
[~vgarg]: thanks for your reply. I can understand the importance of column
stats to estimate the statistics. What i am confused is in logical plan we uses
{{true}} to get the column stats from metastore even we can not get [result
|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L351]
from metastore and
[estimateStatsForMissingCols|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L354].
But in the statistics
estimation([StatsRulesProcFactory|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L134]),
we even do not estimate the column stats once we set
{{hive.stats.fetch.column.stats}} as false.Can we do some refactor for
[StatsUtils#collectStatistics|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L349]
like
{code}
if (fetchColStats) {
colStats = getTableColumnStats(table, schema, neededColumns,
colStatsCache);
}
//Although not fetch column stats from metastore, we still estimate the column
stats
if(colStats == null) {
colStats = Lists.newArrayList();
}
estimateStatsForMissingCols(neededColumns, colStats, table, conf, nr, schema);
// we should have stats for all columns (estimated or actual)
assert(neededColumns.size() == colStats.size());
long betterDS = getDataSizeFromColumnStats(nr, colStats);
ds = (betterDS < 1 || colStats.isEmpty()) ? ds : betterDS;
{code}
> Use properties from HiveConf about "fetchColStats" and "fetchPartStats" in
> RelOptHiveTable#updateColStats
> ---------------------------------------------------------------------------------------------------------
>
> Key: HIVE-17634
> URL: https://issues.apache.org/jira/browse/HIVE-17634
> Project: Hive
> Issue Type: Bug
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Attachments: HIVE-17634.patch
>
>
> in
> [RelOptHiveTable#updateColStats|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L309],
> we set {{fetchColStats}},{{fetchPartStats}} as true when call
> {{StatsUtils.collectStatistics}}
> {code}
> if (!hiveTblMetadata.isPartitioned()) {
> // 2.1 Handle the case for unpartitioned table.
> try {
> Statistics stats = StatsUtils.collectStatistics(hiveConf, null,
> hiveTblMetadata, hiveNonPartitionCols,
> nonPartColNamesThatRqrStats,
> colStatsCached, nonPartColNamesThatRqrStats, true, true);
> ...
> {code}
> This will cause querying columns statistic from metastore even we set
> {{hive.stats.fetch.column.stats}} and {{hive.stats.fetch.partition.stats}} as
> false in HiveConf. If we these two properties as false, we can not any
> column statistics from metastore. Suggest to set the properties from
> HiveConf.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)