flyrain commented on PR #4888: URL: https://github.com/apache/iceberg/pull/4888#issuecomment-1163769266
Hi @aokolnychyi and @RussellSpitzer, vectorized read is enabled by default several months ago. But the benchmark still assumes it false by default. I have set it false explicitly, and run the benchmark again. Now we can see the big performance gain between vectorized and non-vectorized read, as the following diagram shows. <img width="996" alt="Screen Shot 2022-06-22 at 4 22 09 PM" src="https://user-images.githubusercontent.com/1322359/175173362-e2e6d636-4e5c-4ed4-bcc5-a8888c6b1e1c.png"> I also profile the benchmarks. Here is the flame graph for 25% vectorized read. It looks normal to me. The program spent majority time to read pos delete file and the data file. The read of position delete file is still non-vectorized, which takes a big portion. Would suggest to enable the vectorized read on delete files to improve the overall perf. That's probably next step we can do. <img width="1543" alt="Screen Shot 2022-06-22 at 4 26 57 PM" src="https://user-images.githubusercontent.com/1322359/175176096-e52048aa-5524-49b8-9645-7cdfdc9463e7.png"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
