huyuanfeng2018 commented on issue #6303:
URL: https://github.com/apache/iceberg/issues/6303#issuecomment-1514984945

   Hi,I am very interested in the fast project. At present, we have a serious 
tilt problem in the process of using iceberg to write. I have been paying 
attention to the progress of this module. Now I want to put forward some of my 
ideas.
   
   I took a close look at https://github.com/apache/iceberg/pull/6382 and 
https://github.com/apache/iceberg/pull/7269
   
   I think there are some problems. I completed the following simple 
implementation based on these two PRs on my own branch, but the throughput of 
the program has dropped significantly, almost reaching the point of being 
unusable, so I think, should we Stop and think about whether this solution is 
suitable
   
   From my observation, the problem lies in the DataStatisticsOperator. When 
output.collect is called here, Flink’s serialization will be forced to be 
triggered, but DataStatisticsOrRecord will degenerate into kryo mode during 
serialization, resulting in a performance drop of more than 4 times 
   <img width="1433" alt="image" 
src="https://user-images.githubusercontent.com/40817998/233132909-209f9b69-1197-4088-8572-d30e2bbe7ea4.png";>
   Spent too many computing resources in serialization, So I think we may need 
to seriously consider the feasibility of this option
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to