huyuanfeng2018 commented on issue #7393:
URL: https://github.com/apache/iceberg/issues/7393#issuecomment-1519739497

   HI, @stevenzwu @stevenzwu @hililiwei 
   Thank you for your reply!
   My scenario is that the server logs are written to iceberg in real time, and 
the peak period of real-time data volume is about 1.0M/s,
   <img width="456" alt="image" 
src="https://user-images.githubusercontent.com/40817998/233956036-25ef3566-3e6a-4d11-b8df-b7b7b171177a.png";>
   At present, according to the day, hour, and an enumeration partition field, 
we have about 70 enumeration partitions, of which two enumerations account for 
more than 70% of the total, so the current iceberg write mode certainly cannot 
meet our requirements. Requirements, currently we have 200 parallel writes 
online, shuffling by defining the ratio of the amount of data under each 
enumeration to the total amount of data by ourselves, specifying the ratio like 
this
   
   'distribution-balance-column-ratio' = 
'sysdk_android:0.0005,_wap:0.0003,android_tv:0.003.......'
   
   However, the proportion of each enumeration will change in certain time 
periods, so there will still be a tilt in certain time periods, resulting in a 
backlog of my tasks.
   
   So I tried to achieve automatic balancing, but under the same cluster 
configuration, my processing efficiency was 4 times slower, about 200~300k/s, 
among which I have put the flame graph on it, and most of the processing is I 
am doing the serialization operation of the statistics record. I think if you 
re-implement the serialization interface of the record, can you give me a 
sample and I can test it in my scenario to see how much improvement there is. 
In addition, if necessary, I will I can help as much as I can


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to