[GitHub] [iceberg] huyuanfeng2018 opened a new issue, #7393: The serialization problem caused by Flink shuffling design

via GitHub Thu, 20 Apr 2023 23:09:01 -0700


huyuanfeng2018 opened a new issue, #7393:
URL: https://github.com/apache/iceberg/issues/7393


   ### Feature Request / Improvement
   
   this issue from #6303，I open a new issue to discuss this issue
   
   I am very interested in the project. At present, we have a serious tilt 
problem in the process of using iceberg to write. I have been paying attention 
to the progress of this module. Now I want to put forward some of my ideas.
   
   I took a close look at https://github.com/apache/iceberg/pull/6382 and 
https://github.com/apache/iceberg/pull/7269
   
   I think there are some problems. I completed the following simple 
implementation based on these two PRs on my own branch, but the throughput of 
the program has dropped significantly, almost reaching the point of being 
unusable, so I think, should we Stop and think about whether this solution is 
suitable
   
   From my observation, the problem lies in the DataStatisticsOperator. When 
output.collect is called here, Flink’s serialization will be forced to be 
triggered, but DataStatisticsOrRecord will degenerate into kryo mode during 
serialization, resulting in a performance drop of more than 4 times
   <img width="1433" alt="image" 
src="https://user-images.githubusercontent.com/40817998/233132909-209f9b69-1197-4088-8572-d30e2bbe7ea4.png";>
   Spent too many computing resources in serialization, So I think we may need 
to seriously consider the feasibility of this Proposal
   
   @stevenzwu @hililiwei
   
   ### Query engine
   
   Flink


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] huyuanfeng2018 opened a new issue, #7393: The serialization problem caused by Flink shuffling design

Reply via email to