[ 
https://issues.apache.org/jira/browse/HIVE-17634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185327#comment-16185327
 ] 

liyunzhang_intel commented on HIVE-17634:
-----------------------------------------

[~vgarg]: thanks for your explanation.
{quote}
I am not convinced why would user not want to fetch stats from metastore and 
instead rely upon estimated statistics?
{quote}
from the document it said "Fetching column statistics for each needed column 
can be expensive when the number of columns is high". The default value of  
hive.stats.fetch.column.stats is false. Maybe users do not enable this property 
because they need use {{analyze table xxx compute statistics for columns}} to 
collect column statistics and this command are time-consuming for table with 
high number of columns.
{code}
    HIVE_STATS_FETCH_COLUMN_STATS("hive.stats.fetch.column.stats", false,
        "Annotation of operator tree with statistics information requires 
column statistics.\n" +
        "Column statistics are fetched from metastore. Fetching column 
statistics for each needed column\n" +
        "can be expensive when the number of columns is high. This flag can be 
used to disable fetching\n" +
        "of column statistics from metastore."),
{code}


> Use properties from HiveConf about "fetchColStats" and "fetchPartStats" in 
> RelOptHiveTable#updateColStats
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17634
>                 URL: https://issues.apache.org/jira/browse/HIVE-17634
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-17634.patch
>
>
> in 
> [RelOptHiveTable#updateColStats|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L309],
>  we set {{fetchColStats}},{{fetchPartStats}} as true when call 
> {{StatsUtils.collectStatistics}}
> {code}
>    if (!hiveTblMetadata.isPartitioned()) {
>         // 2.1 Handle the case for unpartitioned table.
>         try {
>           Statistics stats = StatsUtils.collectStatistics(hiveConf, null,
>               hiveTblMetadata, hiveNonPartitionCols, 
> nonPartColNamesThatRqrStats,
>               colStatsCached, nonPartColNamesThatRqrStats, true, true);
>       ...
> {code}
> This will cause querying columns statistic from metastore even we set  
> {{hive.stats.fetch.column.stats}} and {{hive.stats.fetch.partition.stats}} as 
> false in HiveConf.  If we these two properties as false, we can not any 
> column statistics from metastore.  Suggest to set the properties from 
> HiveConf. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to