qjqqyy commented on issue #5107:
URL: https://github.com/apache/hudi/issues/5107#issuecomment-1077384564


   @YuweiXiao Hi, allow me to rephrase. This is what's actually going on in git 
master (removing the adapter abstraction)
   
   ```scala
   df.mapPartitions { rows => 
     rows.map { row =>
       new AvroSerializer().serialize(row)
     }
   }
   ```
   
   there could be extra initialization cost for each invocation of `serialize` 
but it seems to be *recreating* the AvroSerializer for each row, which I 
suspect is the main culprit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to