wecharyu commented on PR #5141:
URL: https://github.com/apache/hive/pull/5141#issuecomment-2065722093

   Hi @cxzl25, I have made a simple test on a table with 11 partition levels 
and over 3 millions partitions, the result shows a performance regression:
   ```sql
   -- Before this patch
   mysql> select count(1) from PARTITIONS where TBL_ID=96844 and PART_NAME like 
'grass_region=BR/%';
   +----------+
   | count(1) |
   +----------+
   |   263424 |
   +----------+
   1 row in set (1.32 sec)
   
   -- After this patch
   mysql> select count(1) from PARTITIONS where TBL_ID=96844 and PART_NAME like 
'grass_region=BR/regional_date=_%/regional_hour=_%/log_type=_%/event_type=_%/domain_type=_%/local_date=_%/local_hour=_%/utc_date=_%/utc_hour=_%/datehour=_%';
   +----------+
   | count(1) |
   +----------+
   |   263448 |
   +----------+
   1 row in set (1.39 sec)
   ```
   
   We can analyze these queries:
   ```sql
   -- Before this patch
   mysql> explain select count(1) from PARTITIONS where TBL_ID=96844 and 
PART_NAME like 'grass_region=BR/%';
   
+----+-------------+------------+------------+-------+-----------------------------+-------------+---------+------+---------+----------+--------------------------+
   | id | select_type | table      | partitions | type  | possible_keys         
      | key         | key_len | ref  | rows    | filtered | Extra               
     |
   
+----+-------------+------------+------------+-------+-----------------------------+-------------+---------+------+---------+----------+--------------------------+
   |  1 | SIMPLE      | PARTITIONS | NULL       | range | 
UNIQUEPARTITION,UNIQUE_PART | UNIQUE_PART | 779     | NULL | 6586446 |   100.00 
| Using where; Using index |
   
+----+-------------+------------+------------+-------+-----------------------------+-------------+---------+------+---------+----------+--------------------------+
   1 row in set, 1 warning (0.01 sec)
   
   -- After this patch
   mysql> explain select count(1) from PARTITIONS where TBL_ID=96844 and 
PART_NAME like 
'grass_region=BR/regional_date=_%/regional_hour=_%/log_type=_%/event_type=_%/domain_type=_%/local_date=_%/local_hour=_%/utc_date=_%/utc_hour=_%/datehour=_%';
   
+----+-------------+------------+------------+-------+-----------------------------+-------------+---------+------+---------+----------+--------------------------+
   | id | select_type | table      | partitions | type  | possible_keys         
      | key         | key_len | ref  | rows    | filtered | Extra               
     |
   
+----+-------------+------------+------------+-------+-----------------------------+-------------+---------+------+---------+----------+--------------------------+
   |  1 | SIMPLE      | PARTITIONS | NULL       | range | 
UNIQUEPARTITION,UNIQUE_PART | UNIQUE_PART | 779     | NULL | 6586446 |   100.00 
| Using where; Using index |
   
+----+-------------+------------+------------+-------+-----------------------------+-------------+---------+------+---------+----------+--------------------------+
   1 row in set, 1 warning (0.00 sec)
   ```
   Actually both patch will use the partial index of `UNIQUE_PART (TBL_ID, 
PART_NAME)`, and match the like clause for PART_NAME. And since the PART_ID is 
certain, all the following PART_NAME format is the same, so the like match will 
not benefit from with more `part_col=_%` here. On the contrary, more wildcard 
matches may impact performance.
   
   BTW, in the https://github.com/apache/hive/pull/4831#discussion_r1530023612, 
 I think the rows of explain result should be an estimated result, we can not 
determine the performance by it completely. Could you check it again?, and pls 
correct me if I have any mistakes.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to