Re: [PR] HIVE-27725: Remove redundant columns in TAB_COL_STATS and PART_COL_STATS tables [hive]

via GitHub Wed, 03 Jan 2024 03:17:56 -0800


wecharyu commented on code in PR #4744:
URL: https://github.com/apache/hive/pull/4744#discussion_r1440337096



##########
standalone-metastore/metastore-server/src/main/sql/postgres/hive-schema-4.0.0-beta-2.postgres.sql:
##########
@@ -1195,7 +1188,7 @@ CREATE INDEX "PART_PRIVS_N49" ON "PART_PRIVS" USING btree 
("PART_ID");
 -- Name: PCS_STATS_IDX; Type: INDEX; Schema: public; Owner: hiveuser; 
Tablespace:
 --
 
-CREATE INDEX "PCS_STATS_IDX" ON "PART_COL_STATS" USING btree 
("DB_NAME","TABLE_NAME","COLUMN_NAME","PARTITION_NAME","CAT_NAME");
+CREATE INDEX "PCS_STATS_IDX" ON "PART_COL_STATS" USING btree 
("PART_ID","COLUMN_NAME");

Review Comment:
   Nice catch! I have concentrated on MySQL side, and forget others such as 
Postgres. Actually the updated index `PART_COL_STATS(PART_ID, COLUMN_NAME)` 
covered the `PART_COL_STATS_N49 (PART_ID)`, we could delete the latter which is 
what has been done in MySQL. After this it should use the PART_COL_STATS index, 
could you have a test for it? @dengzhhu653 
   
   BWY, we have applied #4831 in our prod env, and add a new index `UNIQUE_PART 
(TBL_ID, PART_NAME)`, the MySQL did not choose the best index either in this 
case, and then we dropped the redundant index `PARTITIONS_N49 (TBL_ID)`, and 
the `getPartitionsByFilter` has gained huge improvement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-27725: Remove redundant columns in TAB_COL_STATS and PART_COL_STATS tables [hive]

Reply via email to