Gopal V created HIVE-19790:
------------------------------
Summary: Metastore upgrade: 3.1.0 upgrade script is slow and
non-idempotent
Key: HIVE-19790
URL: https://issues.apache.org/jira/browse/HIVE-19790
Project: Hive
Issue Type: Bug
Components: Standalone Metastore
Reporter: Gopal V
Because of the giant bit-vectors stored on mysql, the update of PART_COL_STATS
is very slow and also is not idempotent.
{code}
--------------
UPDATE `PART_COL_STATS`
SET `CAT_NAME` = 'hive'
--------------
Query OK, 0 rows affected (4 min 1.57 sec)
Rows matched: 778025 Changed: 0 Warnings: 0
{code}
Adding a filter speeds it up because it will no longer overwrite
{code}
mysql> explain UPDATE `PART_COL_STATS` SET `CAT_NAME` = 'hive' where `CAT_NAME`
='';
--------------
explain UPDATE `PART_COL_STATS` SET `CAT_NAME` = 'hive' where `CAT_NAME` =''
--------------
+----+-------------+----------------+-------+---------------+---------------+---------+-------+------+------------------------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+----------------+-------+---------------+---------------+---------+-------+------+------------------------------+
| 1 | SIMPLE | PART_COL_STATS | range | PCS_STATS_IDX | PCS_STATS_IDX |
258 | const | 1 | Using where; Using temporary |
+----+-------------+----------------+-------+---------------+---------------+---------+-------+------+------------------------------+
1 row in set (0.00 sec)
{code}
this would be much faster to re-run and would not accidentally overwrite any
existing CAT_NAMEs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)