shidayang opened a new issue, #5245: URL: https://github.com/apache/iceberg/issues/5245
My has done a chbenchmark of iceberg on trino. I found that the performance of MOR is very low when have many delete files. The scale of data is 10 warehouse. The average duration is less than 10 second when no have delete files, but when I add some delete file to every table some query spent over one houre. 1. #5195 The Trino every page will call DeleteFilter#filter, every calling of DeleteFilter#filter will initialize delete files. 2. #5244 #5242 We found that the cost of creating StructLikeWrapper and InternalRecordWrapper is high. this is Flame Graph: <img width="1410" alt="image" src="https://user-images.githubusercontent.com/26699250/178226456-9e953b2b-5154-4693-9b74-2ec9f277fd97.png"> The query performance improved when we made these optimizations. such as the query "select count(*) from stock", before optimize spent 8 minutes, after optimize only spent 20 seconds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
