[ 
https://issues.apache.org/jira/browse/HIVE-29432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18055899#comment-18055899
 ] 

Thomas Rebele commented on HIVE-29432:
--------------------------------------

The plan for calculating the statistics is added at the end of 
{{{}org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#genFileSinkPlan{}}}. The 
method {{canRunAutogatherStats(Operator curr)}} checks whether all types are 
supported. If there is any column with an unsupported type, the pipeline for 
autogather will not be added.

> Statistics missing for tables with a TIMESTAMP WITH LOCAL TIME ZONE
> -------------------------------------------------------------------
>
>                 Key: HIVE-29432
>                 URL: https://issues.apache.org/jira/browse/HIVE-29432
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.3.0
>            Reporter: Thomas Rebele
>            Priority: Major
>
> Given the following qfile:
> {code:java}
> set hive.stats.kll.enable=true;
> set metastore.stats.fetch.bitvector=true;
> set metastore.stats.fetch.kll=true;
> set hive.stats.autogather=true;
> set hive.stats.column.autogather=true;
> CREATE TABLE test_stats0 (a int, b timestamp) STORED AS TEXTFILE;
> CREATE TABLE test_stats1 (a int, b timestamp with local time zone) STORED AS 
> TEXTFILE;
> INSERT INTO test_stats0 (a, b) VALUES (1, "2020-11-02 00:00:00");
> INSERT INTO test_stats1 (a, b) VALUES (1, "2020-11-02 00:00:00");
> DESCRIBE FORMATTED test_stats0 a;
> DESCRIBE FORMATTED test_stats0 b;
> DESCRIBE FORMATTED test_stats1 a;
> DESCRIBE FORMATTED test_stats1 b;
>  {code}
> The statistics for test_stats0 column a are computed successfully:
> {code:java}
> POSTHOOK: Input: default@test_stats0
> col_name              a                   
> data_type             int                 
> min                   1                   
> max                   1                   
> num_nulls             0                   
> distinct_count        1                   
> avg_col_len                               
> max_col_len                               
> num_trues                                 
> num_falses                                
> bit_vector            HL                  
> histogram             Q1: 1, Q2: 1, Q3: 1 
> {code}
> However, the statistics for test_stats1 column a are missing:
> {code:java}
> POSTHOOK: Input: default@test_stats1
> col_name              a                   
> data_type             int                 
> min                                       
> max                                       
> num_nulls                                 
> distinct_count                            
> avg_col_len                               
> max_col_len                               
> num_trues                                 
> num_falses                                
> bit_vector                                
> histogram                            
> {code}
> Similar for column b, i.e., stats are available for table test_stats0, but 
> not for test_stats1.
> Even if the stats for a TIMESTAMP WITH LOCAL TIME ZONE column cannot be 
> calculated, it should not affect the other columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to