stevenzwu commented on issue #7393: URL: https://github.com/apache/iceberg/issues/7393#issuecomment-1519324779
@huyuanfeng2018 thx for the experiment. `DataStatisticsOrRecord` is the only way to pass statistics to the custom partitioner. Agree with you that Kryo serialization will be slower. we will need to provide a type serializer for the type, which we had in our internal PoC impl and testing. > resulting in a performance drop of more than 4 times Can you elaborate on the observation of 4x slowdown? what are the A/B test setup? In the benchmark with the internal PoC impl, we observed 60% more CPU overhead for a simple job reading from Kafka and writing to Iceberg with event time partitioned table. As expected, bulk of the overhead comes from serdes and network I/O. cc @yegangy0718 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
