HEPBO3AH opened a new issue, #9612:
URL: https://github.com/apache/hudi/issues/9612

   Hi, we are using Hudi on AWS. We have noticed the following unexpected 
behavior.
   
   A `SELECT * FROM table` creates a significant number of S3 calls:
   
   ```
   
+---------------------------------------------------------------------------------------------------------------------------------+----------+---+
   |path                                                                        
                                                     |httpMethod|cnt|
   
+---------------------------------------------------------------------------------------------------------------------------------+----------+---+
   |my_table/.hoodie                                                            
                                                     |HEAD      |5  |
   |my_table/.hoodie/                                                           
                                                     |HEAD      |5  |
   
|my_table/.hoodie/.aux/.bootstrap/.partitions/00000000-0000-0000-0000-000000000000-0_1-0-1_00000000000001.hfile
                   |HEAD      |5  |
   
|my_table/.hoodie/.aux/.bootstrap/.partitions/00000000-0000-0000-0000-000000000000-0_1-0-1_00000000000001.hfile/
                  |HEAD      |5  |
   |my_table/.hoodie/20221124035739002.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/20221127222955674.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/20221128000946056.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/20230203015652867.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/20230203034909027.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/20230323023115954.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/20230323024631265.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/20230323041457900.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/20230627223911673.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/20230706040420663.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/20230821012127985.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/20230821013120957.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/20230823042339397.replacecommit                            
                                                     |GET       |5  |
   |my_table/.hoodie/hoodie.properties                                          
                                                     |GET       |5  |
   
|my_table/site_id%253D21/42d99963-db7f-400f-9e33-d539c74672aa-0_0-79-6549_20230323023115954.parquet
                               |GET       |3  |
   
|my_table/site_id%253D22/431ca0d1-8af3-4a72-bd17-31f2cd7e97e9-0_0-39-5633_20230323023017903.parquet
                               |GET       |3  |
   
|my_table/site_id%253D23/36675fed-d8ab-4532-aedc-dddf0a32accb-0_1-80-6551_20230323024631265.parquet
                               |GET       |3  |
   
|my_table/site_id%253D24/20efa6e4-489c-4b1a-a474-5dc1731485ed-0_0-80-6551_20230323041457900.parquet
                               |GET       |3  |
   
|my_table/site_id%253D30/15bfe605-bced-4d17-b571-62ebe64c5e97-0_0-80-6552_20230823042339397.parquet
                               |GET       |3  |
   
|my_table/site_id%253D30/27d907a2-7485-450d-9cdd-9f9c7e95fe88-0_0-39-5633_20230823044858848.parquet
                               |GET       |3  |
   |my_table/site_id%253D21/.hoodie_partition_metadata                          
                                                     |HEAD      |1  |
   |my_table/site_id%253D21/.hoodie_partition_metadata                          
                                                     |GET       |1  |
   
|my_table/site_id%253D21/42d99963-db7f-400f-9e33-d539c74672aa-0_0-79-6549_20230323023115954.parquet
                               |HEAD      |1  |
   |my_table/site_id%253D22/.hoodie_partition_metadata                          
                                                     |HEAD      |1  |
   |my_table/site_id%253D22/.hoodie_partition_metadata                          
                                                     |GET       |1  |
   
|my_table/site_id%253D22/431ca0d1-8af3-4a72-bd17-31f2cd7e97e9-0_0-39-5633_20230323023017903.parquet
                               |HEAD      |1  |
   |my_table/site_id%253D23/.hoodie_partition_metadata                          
                                                     |HEAD      |1  |
   |my_table/site_id%253D23/.hoodie_partition_metadata                          
                                                     |GET       |1  |
   
|my_table/site_id%253D23/36675fed-d8ab-4532-aedc-dddf0a32accb-0_1-80-6551_20230323024631265.parquet
                               |HEAD      |1  |
   |my_table/site_id%253D24/.hoodie_partition_metadata                          
                                                     |HEAD      |1  |
   |my_table/site_id%253D24/.hoodie_partition_metadata                          
                                                     |GET       |1  |
   
|my_table/site_id%253D24/20efa6e4-489c-4b1a-a474-5dc1731485ed-0_0-80-6551_20230323041457900.parquet
                               |HEAD      |1  |
   |my_table/site_id%253D30/.hoodie_partition_metadata                          
                                                     |HEAD      |1  |
   |my_table/site_id%253D30/.hoodie_partition_metadata                          
                                                     |GET       |1  |
   
|my_table/site_id%253D30/15bfe605-bced-4d17-b571-62ebe64c5e97-0_0-80-6552_20230823042339397.parquet
                               |HEAD      |1  |
   
|my_table/site_id%253D30/27d907a2-7485-450d-9cdd-9f9c7e95fe88-0_0-39-5633_20230823044858848.parquet
                               |HEAD      |1  |
   
+---------------------------------------------------------------------------------------------------------------------------------+----------+---+
   ```
   
   Why are there so many `HEAD` calls? 
   Why are there multiple `GET` calls per object?
   
   I'm creating this ticket because we have significant number of S3 calls 
across our Hudi tables which seem quite out of place given how many queries we 
do. They are starting to have non-negligible cost implications and even managed 
to cause throttling on S3 which impacted the Hudi job runs.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to