[
https://issues.apache.org/jira/browse/HIVE-27163?focusedWorklogId=859117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859117
]
ASF GitHub Bot logged work on HIVE-27163:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Apr/23 09:19
Start Date: 26/Apr/23 09:19
Worklog Time Spent: 10m
Work Description: dengzhhu653 commented on code in PR #4228:
URL: https://github.com/apache/hive/pull/4228#discussion_r1177588895
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java:
##########
@@ -508,6 +511,48 @@ public static void clearQuickStats(Map<String, String>
params) {
params.remove(StatsSetupConst.NUM_ERASURE_CODED_FILES);
}
+ public static void updateTableStatsForCreateTable(Warehouse wh, Database db,
Table tbl,
+ EnvironmentContext envContext, Configuration conf, Path tblPath, boolean
newDir)
+ throws MetaException {
+ // If the created table is a view, skip generating the stats
+ if (MetaStoreUtils.isView(tbl)) {
+ return;
+ }
+ assert tblPath != null;
+ if (tbl.isSetDictionary() && tbl.getDictionary().getValues() != null) {
+ List<java.nio.ByteBuffer> values = tbl.getDictionary().getValues().
+ remove(StatsSetupConst.STATS_FOR_CREATE_TABLE);
+ java.nio.ByteBuffer buffer;
+ if (values != null && values.size() > 0 && (buffer =
values.get(0)).hasArray()) {
+ String val = new String(buffer.array(), StandardCharsets.UTF_8);
+ if (StatsSetupConst.TRUE.equals(val)) {
+ try {
+ boolean isIcebergTable =
+
HiveMetaHook.ICEBERG.equalsIgnoreCase(tbl.getParameters().get(HiveMetaHook.TABLE_TYPE));
+ PathFilter pathFilter = isIcebergTable ?
+ path -> !"metadata".equals(path.getName()) :
FileUtils.HIDDEN_FILES_PATH_FILTER;
Review Comment:
Move this part to `Table#isIcebergTable` on the client,
https://github.com/apache/hive/pull/4228/files#diff-a88cd54666ea2466fd0c0f1323efecdc0be83a3fbf726d471f94080e31affc15
Issue Time Tracking
-------------------
Worklog Id: (was: 859117)
Time Spent: 2h 10m (was: 2h)
> Column stats are not getting published after an insert query into an external
> table with custom location
> --------------------------------------------------------------------------------------------------------
>
> Key: HIVE-27163
> URL: https://issues.apache.org/jira/browse/HIVE-27163
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Taraka Rama Rao Lethavadla
> Assignee: Zhihua Deng
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> Test case details are below
> *test.q*
> {noformat}
> set hive.stats.column.autogather=true;
> set hive.stats.autogather=true;
> dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test;
> create external table test_custom(age int, name string) stored as orc
> location '/tmp/test';
> insert into test_custom select 1, 'test';
> desc formatted test_custom age;{noformat}
> *test.q.out*
>
>
> {noformat}
> #### A masked pattern was here ####
> PREHOOK: type: CREATETABLE
> #### A masked pattern was here ####
> PREHOOK: Output: database:default
> PREHOOK: Output: default@test_custom
> #### A masked pattern was here ####
> POSTHOOK: type: CREATETABLE
> #### A masked pattern was here ####
> POSTHOOK: Output: database:default
> POSTHOOK: Output: default@test_custom
> PREHOOK: query: insert into test_custom select 1, 'test'
> PREHOOK: type: QUERY
> PREHOOK: Input: _dummy_database@_dummy_table
> PREHOOK: Output: default@test_custom
> POSTHOOK: query: insert into test_custom select 1, 'test'
> POSTHOOK: type: QUERY
> POSTHOOK: Input: _dummy_database@_dummy_table
> POSTHOOK: Output: default@test_custom
> POSTHOOK: Lineage: test_custom.age SIMPLE []
> POSTHOOK: Lineage: test_custom.name SIMPLE []
> PREHOOK: query: desc formatted test_custom age
> PREHOOK: type: DESCTABLE
> PREHOOK: Input: default@test_custom
> POSTHOOK: query: desc formatted test_custom age
> POSTHOOK: type: DESCTABLE
> POSTHOOK: Input: default@test_custom
> col_name age
> data_type int
> min
> max
> num_nulls
> distinct_count
> avg_col_len
> max_col_len
> num_trues
> num_falses
> bit_vector
> comment from deserializer{noformat}
> As we can see from desc formatted output, column stats were not populated
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)