linliu-code opened a new pull request, #9995: URL: https://github.com/apache/hudi/pull/9995
### Change Logs When building profile, the spark driver should only care data distribution on (partition, instant_time, file_id), instead of (partition, instant_time, file_id, record_position). TESTS: 1. Without remove the record position, the driver OOMed constantly. 2. After remove the record position, both 500GB and 1TB query finished successfully. ### Impact This fix removes some stability regression for large queries. ### Risk level (write none, low medium or high below) Low. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
