CaesarWangX opened a new issue, #11201:
URL: https://github.com/apache/hudi/issues/11201

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   In our batch job (reading from a Hudi table in Hive, processing, and writing 
to a new Hudi table with Hive synchronization enabled), we consistently 
encounter the 'listing all partitions' operation, which significantly impacts 
our tasks. Moreover, listing the source table serves no meaningful purpose for 
us. Upon inspecting the source code, we discovered that even when setting 
'hoodie.datasource.read.file.index.listing.mode=lazy' (which is actually the 
default value), it still turns into 'eager' mode during debugging. We 
eventually traced it back to HoodieFileIndex, where it is forcibly changed to 
'eager' mode.
   
   
![image](https://github.com/apache/hudi/assets/12985552/afce6ca9-09b2-47f7-82a4-e7803ffc0bba)
   
   <img width="1354" alt="image" 
src="https://github.com/apache/hudi/assets/12985552/383c0b11-0076-473f-add8-967b3b18c77f";>
   
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.14
   
   
   **Additional context**
   
   After making the source code modifications as shown in the following 
diagram, the task no longer involves listing all partitions, and the output 
results are all correct.
   
   <img width="1326" alt="image" 
src="https://github.com/apache/hudi/assets/12985552/28a83788-36be-45a1-bfd6-ad77a9a33941";>
   
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to