[
https://issues.apache.org/jira/browse/HIVE-27163?focusedWorklogId=861723&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861723
]
ASF GitHub Bot logged work on HIVE-27163:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 12/May/23 10:47
Start Date: 12/May/23 10:47
Worklog Time Spent: 10m
Work Description: simhadri-g commented on code in PR #4228:
URL: https://github.com/apache/hive/pull/4228#discussion_r1192213156
##########
iceberg/iceberg-handler/src/test/results/positive/col_stats.q.out:
##########
@@ -339,17 +339,16 @@ POSTHOOK: type: DESCTABLE
POSTHOOK: Input: default@tbl_ice_puffin
col_name a
data_type int
-min 1
-max 333
-num_nulls 0
-distinct_count 7
+min
+max
+num_nulls
+distinct_count
Review Comment:
This part of the output corresponds to the following code snippet.
```
set hive.iceberg.stats.source=iceberg;
drop table if exists tbl_ice_puffin;
create external table tbl_ice_puffin(a int, b string, c int) stored by
iceberg tblproperties ('format-version'='2');
insert into tbl_ice_puffin values (1, 'one', 50), (2, 'two', 51),(2, 'two',
51),(2, 'two', 51), (3, 'three', 52), (4, 'four', 53), (5, 'five', 54), (111,
'one', 55), (333, 'two', 56);
explain select * from tbl_ice_puffin order by a, b, c;
select * from tbl_ice_puffin order by a, b, c;
select count(*) from tbl_ice_puffin ;
desc formatted tbl_ice_puffin a;
```
In this case the output of `desc formatted tbl_ice_puffin a; ` is accurate.
Issue Time Tracking
-------------------
Worklog Id: (was: 861723)
Time Spent: 5.5h (was: 5h 20m)
> Column stats are not getting published after an insert query into an external
> table with custom location
> --------------------------------------------------------------------------------------------------------
>
> Key: HIVE-27163
> URL: https://issues.apache.org/jira/browse/HIVE-27163
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Taraka Rama Rao Lethavadla
> Assignee: Zhihua Deng
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 5.5h
> Remaining Estimate: 0h
>
> Test case details are below
> *test.q*
> {noformat}
> set hive.stats.column.autogather=true;
> set hive.stats.autogather=true;
> dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test;
> create external table test_custom(age int, name string) stored as orc
> location '/tmp/test';
> insert into test_custom select 1, 'test';
> desc formatted test_custom age;{noformat}
> *test.q.out*
>
>
> {noformat}
> #### A masked pattern was here ####
> PREHOOK: type: CREATETABLE
> #### A masked pattern was here ####
> PREHOOK: Output: database:default
> PREHOOK: Output: default@test_custom
> #### A masked pattern was here ####
> POSTHOOK: type: CREATETABLE
> #### A masked pattern was here ####
> POSTHOOK: Output: database:default
> POSTHOOK: Output: default@test_custom
> PREHOOK: query: insert into test_custom select 1, 'test'
> PREHOOK: type: QUERY
> PREHOOK: Input: _dummy_database@_dummy_table
> PREHOOK: Output: default@test_custom
> POSTHOOK: query: insert into test_custom select 1, 'test'
> POSTHOOK: type: QUERY
> POSTHOOK: Input: _dummy_database@_dummy_table
> POSTHOOK: Output: default@test_custom
> POSTHOOK: Lineage: test_custom.age SIMPLE []
> POSTHOOK: Lineage: test_custom.name SIMPLE []
> PREHOOK: query: desc formatted test_custom age
> PREHOOK: type: DESCTABLE
> PREHOOK: Input: default@test_custom
> POSTHOOK: query: desc formatted test_custom age
> POSTHOOK: type: DESCTABLE
> POSTHOOK: Input: default@test_custom
> col_name age
> data_type int
> min
> max
> num_nulls
> distinct_count
> avg_col_len
> max_col_len
> num_trues
> num_falses
> bit_vector
> comment from deserializer{noformat}
> As we can see from desc formatted output, column stats were not populated
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)