flyrain commented on PR #4683: URL: https://github.com/apache/iceberg/pull/4683#issuecomment-1129573060
Here are benchmarks of equality deletes and position deletes for a 10M-rows data file with different percentage(from 0 to 100%) of deleted rows. The perf of read without is_deleted column are the same with the read with "_deleted = false", which is expected. The grey line is for read of deleted rows. It outputs more and more rows when percentage of deletes increases. The perf chart also makes sense, even though there are room to optimize when there are just a few rows deleted. <img width="1110" alt="Screen Shot 2022-05-17 at 9 58 58 PM" src="https://user-images.githubusercontent.com/1322359/168960450-63ab05bf-3fad-4c40-bc39-06967c35ac50.png"> <img width="1011" alt="Screen Shot 2022-05-17 at 10 01 04 PM" src="https://user-images.githubusercontent.com/1322359/168960699-c1de06f9-b020-4d43-840a-1715a4b7f891.png"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
