[ 
https://issues.apache.org/jira/browse/HIVE-29432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-29432:
----------------------------------
    Labels: pull-request-available  (was: )

> Autogather column statistics missing for tables containing a column with an 
> unsupported type
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-29432
>                 URL: https://issues.apache.org/jira/browse/HIVE-29432
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.3.0
>            Reporter: Thomas Rebele
>            Priority: Major
>              Labels: pull-request-available
>
> Given the following qfile:
> {code:java}
> set hive.stats.kll.enable=true;
> set metastore.stats.fetch.bitvector=true;
> set metastore.stats.fetch.kll=true;
> set hive.stats.autogather=true;
> set hive.stats.column.autogather=true;
> CREATE TABLE test_stats0 (a int, b timestamp) STORED AS TEXTFILE;
> CREATE TABLE test_stats1 (a int, b timestamp with local time zone) STORED AS 
> TEXTFILE;
> INSERT INTO test_stats0 (a, b) VALUES (1, "2020-11-02 00:00:00");
> INSERT INTO test_stats1 (a, b) VALUES (1, "2020-11-02 00:00:00");
> DESCRIBE FORMATTED test_stats0 a;
> DESCRIBE FORMATTED test_stats0 b;
> DESCRIBE FORMATTED test_stats1 a;
> DESCRIBE FORMATTED test_stats1 b;
>  {code}
> The statistics for test_stats0 column a are computed successfully:
> {code:java}
> POSTHOOK: Input: default@test_stats0
> col_name              a                   
> data_type             int                 
> min                   1                   
> max                   1                   
> num_nulls             0                   
> distinct_count        1                   
> avg_col_len                               
> max_col_len                               
> num_trues                                 
> num_falses                                
> bit_vector            HL                  
> histogram             Q1: 1, Q2: 1, Q3: 1 
> {code}
> However, the statistics for test_stats1 column a are missing:
> {code:java}
> POSTHOOK: Input: default@test_stats1
> col_name              a                   
> data_type             int                 
> min                                       
> max                                       
> num_nulls                                 
> distinct_count                            
> avg_col_len                               
> max_col_len                               
> num_trues                                 
> num_falses                                
> bit_vector                                
> histogram                            
> {code}
> Similar for column b, i.e., stats are available for table test_stats0, but 
> not for test_stats1.
> Even if the stats for a TIMESTAMP WITH LOCAL TIME ZONE column cannot be 
> calculated, it should not affect the other columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to