[GitHub] [hudi] danny0405 commented on issue #5107: [SUPPORT] High performance costs of AvroSerializer in Datasource writing

GitBox Thu, 24 Mar 2022 03:21:43 -0700


danny0405 commented on issue #5107:
URL: https://github.com/apache/hudi/issues/5107#issuecomment-1077466306



   > @boneanxs True, full support of Dataset is the long term solution. In my 
experiment, optimizing the usage of `AvroSerializer` could save 80% costs of 
the source data reading. But the optimization requires modification of the 
`AvroSerializer` source code in the spark side.
   > 
   > @qjqqyy Yes, each row will initialize `AvroSerializer` (variables in the 
lambda named `converter`)
   
   Can we copy a `AvroSerializer` currently on Hudi side and just use that ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] danny0405 commented on issue #5107: [SUPPORT] High performance costs of AvroSerializer in Datasource writing

Reply via email to