huyuanfeng2018 opened a new issue, #7393: URL: https://github.com/apache/iceberg/issues/7393
### Feature Request / Improvement this issue from #6303,I open a new issue to discuss this issue I am very interested in the project. At present, we have a serious tilt problem in the process of using iceberg to write. I have been paying attention to the progress of this module. Now I want to put forward some of my ideas. I took a close look at https://github.com/apache/iceberg/pull/6382 and https://github.com/apache/iceberg/pull/7269 I think there are some problems. I completed the following simple implementation based on these two PRs on my own branch, but the throughput of the program has dropped significantly, almost reaching the point of being unusable, so I think, should we Stop and think about whether this solution is suitable From my observation, the problem lies in the DataStatisticsOperator. When output.collect is called here, Flink’s serialization will be forced to be triggered, but DataStatisticsOrRecord will degenerate into kryo mode during serialization, resulting in a performance drop of more than 4 times <img width="1433" alt="image" src="https://user-images.githubusercontent.com/40817998/233132909-209f9b69-1197-4088-8572-d30e2bbe7ea4.png"> Spent too many computing resources in serialization, So I think we may need to seriously consider the feasibility of this Proposal @stevenzwu @hililiwei ### Query engine Flink -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
