spyzzz commented on issue #2175: URL: https://github.com/apache/hudi/issues/2175#issuecomment-709876261
After some deep research i finally found something. I first try to do only a read and write without any transformation and its was way faster (around 500K in 30s) so i tried step by step to find what was the bottleneck and in fact it was my avro deserialisation : ``` xxx.readStream.selectExpr("deserialize(value) as message") ``` So i manage to find a better solution to deserialise avro message with io confluent schema registry ``` xxx.readStream.select(from_avro(col("value"), schema)) ``` And now i can read and write 500K messages in HUDI in 1.5min. That's way better ... ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org