stevenzwu commented on issue #7393:
URL: https://github.com/apache/iceberg/issues/7393#issuecomment-1519324779

   @huyuanfeng2018 thx for the experiment. `DataStatisticsOrRecord` is the only 
way to pass statistics to the custom partitioner. Agree with you that Kryo 
serialization will be slower. we will need to provide a type serializer for the 
type, which we had in our internal PoC impl and testing.
   
   > resulting in a performance drop of more than 4 times
   
   Can you elaborate on the observation of 4x slowdown? what are the A/B test 
setup?
   
   In the benchmark with the internal PoC impl, we observed 60% more CPU 
overhead for a simple job reading from Kafka and writing to Iceberg with event 
time partitioned table. As expected, bulk of the overhead comes from serdes and 
network I/O.
   
   cc @yegangy0718 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to