wwj6591812 commented on PR #6914: URL: https://github.com/apache/paimon/pull/6914#issuecomment-3755146184
> Hi @wwj6591812 > > 1. We can optimize the bucket path function (this is the performance bottleneck) and test its performance to see if the optimization effect can be achieved without pushing down the limit to scan. > 2. If the performance improvement is not obvious, it is necessary to merge the two methods, postFilterManifestEntries and limitPushManifestEntries, and only keep one, postFilterManifestEntries. Hi,I add cache to test. 一、The result of test is : 1、Append Table <img width="2008" height="968" alt="image" src="https://github.com/user-attachments/assets/98b5748a-b324-489c-8c2a-2447c45f355e" /> 2、PK Table <img width="2004" height="956" alt="image" src="https://github.com/user-attachments/assets/38d9cd64-30f7-41ba-b388-9af93329fe20" /> 二、After test, I've decided to: 1、Performance can degrade with a low cache hit rate. Therefore, instead of adding a cache for bucketPath construction, I've moved this logic out of the for loop to reduce the number of calls to FileStorePathFactory#bucketPath. 2、Merged the limitPushManifestEntries method into postFilterManifestEntries, keeping only the latter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
