[
https://issues.apache.org/jira/browse/HIVE-27163?focusedWorklogId=860937&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860937
]
ASF GitHub Bot logged work on HIVE-27163:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 08/May/23 08:08
Start Date: 08/May/23 08:08
Worklog Time Spent: 10m
Work Description: dengzhhu653 commented on code in PR #4228:
URL: https://github.com/apache/hive/pull/4228#discussion_r1187154337
##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableDesc.java:
##########
@@ -921,14 +925,23 @@ public Table toTable(HiveConf conf) throws HiveException {
// When replicating the statistics for a table will be obtained from the
source. Do not
// reset it on replica.
if (replicationSpec == null || !replicationSpec.isInReplicationScope()) {
- if (!this.isCTAS && (tbl.getPath() == null || (!isExternal() &&
tbl.isEmpty()))) {
- if (!tbl.isPartitioned() &&
conf.getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
-
StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(),
- MetaStoreUtils.getColumnNames(tbl.getCols()),
StatsSetupConst.TRUE);
- }
- } else {
-
StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(),
null,
- StatsSetupConst.FALSE);
+ // Remove COLUMN_STATS_ACCURATE=true from table's parameter, let the HMS
determine if
+ // there is need to add column stats dependent on the table's location.
+
StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(),
null,
+ StatsSetupConst.FALSE);
+ if (!this.isCTAS && !tbl.isPartitioned() && !tbl.isTemporary() &&
+ conf.getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
+ // Put the flag into the dictionary in order not to pollute the table,
+ // ObjectDictionary is meant to convey repeatitive messages.
+ ObjectDictionary dictionary = tbl.getTTable().isSetDictionary() ?
+ tbl.getTTable().getDictionary() : new ObjectDictionary();
+ List<ByteBuffer> buffers = new ArrayList<>();
+ String statsSetup =
StatsSetupConst.ColumnStatsSetup.getStatsSetupAsString(true,
+ tbl.isIcebergTable() ? "metadata" : null, // Skip metadata
directory for Iceberg table
Review Comment:
The `HiveStorageHandler` does not have such API for this purpose, and I'm a
little nervous to introduce a new one in `HiveStorageHandler`.
Removed the `isIcebergTable()` from the `Table` class, use
`storageHandler.isMetadataTableSupported()`(only support Iceberg tables
currently) instead.
Issue Time Tracking
-------------------
Worklog Id: (was: 860937)
Time Spent: 4h (was: 3h 50m)
> Column stats are not getting published after an insert query into an external
> table with custom location
> --------------------------------------------------------------------------------------------------------
>
> Key: HIVE-27163
> URL: https://issues.apache.org/jira/browse/HIVE-27163
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Taraka Rama Rao Lethavadla
> Assignee: Zhihua Deng
> Priority: Major
> Labels: pull-request-available
> Time Spent: 4h
> Remaining Estimate: 0h
>
> Test case details are below
> *test.q*
> {noformat}
> set hive.stats.column.autogather=true;
> set hive.stats.autogather=true;
> dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test;
> create external table test_custom(age int, name string) stored as orc
> location '/tmp/test';
> insert into test_custom select 1, 'test';
> desc formatted test_custom age;{noformat}
> *test.q.out*
>
>
> {noformat}
> #### A masked pattern was here ####
> PREHOOK: type: CREATETABLE
> #### A masked pattern was here ####
> PREHOOK: Output: database:default
> PREHOOK: Output: default@test_custom
> #### A masked pattern was here ####
> POSTHOOK: type: CREATETABLE
> #### A masked pattern was here ####
> POSTHOOK: Output: database:default
> POSTHOOK: Output: default@test_custom
> PREHOOK: query: insert into test_custom select 1, 'test'
> PREHOOK: type: QUERY
> PREHOOK: Input: _dummy_database@_dummy_table
> PREHOOK: Output: default@test_custom
> POSTHOOK: query: insert into test_custom select 1, 'test'
> POSTHOOK: type: QUERY
> POSTHOOK: Input: _dummy_database@_dummy_table
> POSTHOOK: Output: default@test_custom
> POSTHOOK: Lineage: test_custom.age SIMPLE []
> POSTHOOK: Lineage: test_custom.name SIMPLE []
> PREHOOK: query: desc formatted test_custom age
> PREHOOK: type: DESCTABLE
> PREHOOK: Input: default@test_custom
> POSTHOOK: query: desc formatted test_custom age
> POSTHOOK: type: DESCTABLE
> POSTHOOK: Input: default@test_custom
> col_name age
> data_type int
> min
> max
> num_nulls
> distinct_count
> avg_col_len
> max_col_len
> num_trues
> num_falses
> bit_vector
> comment from deserializer{noformat}
> As we can see from desc formatted output, column stats were not populated
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)