[ 
https://issues.apache.org/jira/browse/HIVE-17634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185281#comment-16185281
 ] 

liyunzhang_intel commented on HIVE-17634:
-----------------------------------------

[~vgarg]: thanks for your reply. I can understand the importance of column 
stats to estimate the statistics. What i am confused is in logical plan we uses 
{{true}} to get the column stats from metastore even we can not get [result 
|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L351]
  from metastore and 
[estimateStatsForMissingCols|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L354].
 But in the statistics 
estimation([StatsRulesProcFactory|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L134]),
 we even do not estimate the column stats once we set 
{{hive.stats.fetch.column.stats}} as false.Can we do some refactor for 
[StatsUtils#collectStatistics|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L349]
 like 
{code}
if (fetchColStats) {
        colStats = getTableColumnStats(table, schema, neededColumns, 
colStatsCache);
   
}
//Although not fetch column stats from metastore, we still estimate the column 
stats
 if(colStats == null) {
          colStats = Lists.newArrayList();
}

estimateStatsForMissingCols(neededColumns, colStats, table, conf, nr, schema);

// we should have stats for all columns (estimated or actual)
assert(neededColumns.size() == colStats.size());
long betterDS = getDataSizeFromColumnStats(nr, colStats);
ds = (betterDS < 1 || colStats.isEmpty()) ? ds : betterDS;
  

{code}

> Use properties from HiveConf about "fetchColStats" and "fetchPartStats" in 
> RelOptHiveTable#updateColStats
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17634
>                 URL: https://issues.apache.org/jira/browse/HIVE-17634
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-17634.patch
>
>
> in 
> [RelOptHiveTable#updateColStats|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L309],
>  we set {{fetchColStats}},{{fetchPartStats}} as true when call 
> {{StatsUtils.collectStatistics}}
> {code}
>    if (!hiveTblMetadata.isPartitioned()) {
>         // 2.1 Handle the case for unpartitioned table.
>         try {
>           Statistics stats = StatsUtils.collectStatistics(hiveConf, null,
>               hiveTblMetadata, hiveNonPartitionCols, 
> nonPartColNamesThatRqrStats,
>               colStatsCached, nonPartColNamesThatRqrStats, true, true);
>       ...
> {code}
> This will cause querying columns statistic from metastore even we set  
> {{hive.stats.fetch.column.stats}} and {{hive.stats.fetch.partition.stats}} as 
> false in HiveConf.  If we these two properties as false, we can not any 
> column statistics from metastore.  Suggest to set the properties from 
> HiveConf. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to