[
https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947732#comment-14947732
]
Siddharth Seth commented on HIVE-11786:
---------------------------------------
No difference with the remaining indexes.
(The index creation takes a long time btw - and may impact stat generation ?)
{code}
2015-10-07T18:38:09,444 DEBUG [main([])]: metastore.MetaStoreDirectSql
(MetaStoreDirectSql.java:timingTrace(819)) - Direct SQL query in 16195.018669ms
+ 0.058186ms, the query is [select "COLUMN_NAME", "COLUMN_TYPE",
min("LONG_LOW_VALUE"), max("LONG_HIGH_VALUE"), min("DOUBLE_LOW_VALUE"),
max("DOUBLE_HIGH_VALUE"), min(cast("BIG_DECIMAL_LOW_VALUE" as decimal)),
max(cast("BIG_DECIMAL_HIGH_VALUE" as decimal)), sum("NUM_NULLS"),
max("NUM_DISTINCTS"), max("AVG_COL_LEN"), max("MAX_COL_LEN"), sum("NUM_TRUES"),
sum("NUM_FALSES"),
avg(("LONG_HIGH_VALUE"-"LONG_LOW_VALUE")/cast("NUM_DISTINCTS" as
decimal)),avg(("DOUBLE_HIGH_VALUE"-"DOUBLE_LOW_VALUE")/"NUM_DISTINCTS"),avg((cast("BIG_DECIMAL_HIGH_VALUE"
as decimal)-cast("BIG_DECIMAL_LOW_VALUE" as
decimal))/"NUM_DISTINCTS"),sum("NUM_DISTINCTS") from (SELECT "DBS"."NAME"
"DB_NAME", "TBLS"."TBL_NAME" "TABLE_NAME", "PARTITIONS"."PART_NAME"
"PARTITION_NAME", "PCS"."COLUMN_NAME", "PCS"."COLUMN_TYPE",
"PCS"."LONG_LOW_VALUE", "PCS"."LONG_HIGH_VALUE", "PCS"."DOUBLE_HIGH_VALUE",
"PCS"."DOUBLE_LOW_VALUE", "PCS"."BIG_DECIMAL_LOW_VALUE",
"PCS"."BIG_DECIMAL_HIGH_VALUE", "PCS"."NUM_NULLS", "PCS"."NUM_DISTINCTS",
"PCS"."AVG_COL_LEN","PCS"."MAX_COL_LEN", "PCS"."NUM_TRUES",
"PCS"."NUM_FALSES","PCS"."LAST_ANALYZED" FROM "PART_COL_STATS" "PCS" JOIN
"PARTITIONS" ON ("PCS"."PART_ID" = "PARTITIONS"."PART_ID") JOIN "TBLS" ON
("PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID") JOIN "DBS" ON ("TBLS"."DB_ID" =
"DBS"."DB_ID")) VW where "DB_NAME" = ? and "TABLE_NAME" = ? and "COLUMN_NAME"
in (?) and "PARTITION_NAME" in
(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?
{code}
{code}
2015-10-07T18:38:29,309 DEBUG [main([])]: metastore.MetaStoreDirectSql
(MetaStoreDirectSql.java:timingTrace(819)) - Direct SQL query in 18651.1996ms +
0.050665ms, the query is [select "COLUMN_NAME", "COLUMN_TYPE",
min("LONG_LOW_VALUE"), max("LONG_HIGH_VALUE"), min("DOUBLE_LOW_VALUE"),
max("DOUBLE_HIGH_VALUE"), min(cast("BIG_DECIMAL_LOW_VALUE" as decimal)),
max(cast("BIG_DECIMAL_HIGH_VALUE" as decimal)), sum("NUM_NULLS"),
max("NUM_DISTINCTS"), max("AVG_COL_LEN"), max("MAX_COL_LEN"), sum("NUM_TRUES"),
sum("NUM_FALSES"),
avg(("LONG_HIGH_VALUE"-"LONG_LOW_VALUE")/cast("NUM_DISTINCTS" as
decimal)),avg(("DOUBLE_HIGH_VALUE"-"DOUBLE_LOW_VALUE")/"NUM_DISTINCTS"),avg((cast("BIG_DECIMAL_HIGH_VALUE"
as decimal)-cast("BIG_DECIMAL_LOW_VALUE" as
decimal))/"NUM_DISTINCTS"),sum("NUM_DISTINCTS") from (SELECT "DBS"."NAME"
"DB_NAME", "TBLS"."TBL_NAME" "TABLE_NAME", "PARTITIONS"."PART_NAME"
"PARTITION_NAME", "PCS"."COLUMN_NAME", "PCS"."COLUMN_TYPE",
"PCS"."LONG_LOW_VALUE", "PCS"."LONG_HIGH_VALUE", "PCS"."DOUBLE_HIGH_VALUE",
"PCS"."DOUBLE_LOW_VALUE", "PCS"."BIG_DECIMAL_LOW_VALUE",
"PCS"."BIG_DECIMAL_HIGH_VALUE", "PCS"."NUM_NULLS", "PCS"."NUM_DISTINCTS",
"PCS"."AVG_COL_LEN","PCS"."MAX_COL_LEN", "PCS"."NUM_TRUES",
"PCS"."NUM_FALSES","PCS"."LAST_ANALYZED" FROM "PART_COL_STATS" "PCS" JOIN
"PARTITIONS" ON ("PCS"."PART_ID" = "PARTITIONS"."PART_ID") JOIN "TBLS" ON
("PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID") JOIN "DBS" ON ("TBLS"."DB_ID" =
"DBS"."DB_ID")) VW where "DB_NAME" = ? and "TABLE_NAME" = ? and "COLUMN_NAME"
in (?) and "PARTITION_NAME" in
(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?
{code}
> Deprecate the use of redundant column in colunm stats related tables
> --------------------------------------------------------------------
>
> Key: HIVE-11786
> URL: https://issues.apache.org/jira/browse/HIVE-11786
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Reporter: Chaoyu Tang
> Assignee: Chaoyu Tang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch,
> HIVE-11786.2.patch, HIVE-11786.patch
>
>
> The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns
> such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have
> foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS.
> These redundant columns violate database normalization rules and cause a lot
> of inconvenience (sometimes difficult) in column stats related feature
> implementation. For example, when renaming a table, we have to update
> TABLE_NAME column in these tables as well which is unnecessary.
> This JIRA is first to deprecate the use of these columns at HMS code level. A
> followed JIRA is to be opened to focus on DB schema change and upgrade.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)