alexeykudinkin commented on issue #5107:
URL: https://github.com/apache/hudi/issues/5107#issuecomment-1079734532


   Thanks for flagging this @YuweiXiao, great catch! 
   
   To summarize what the issue is here: it is unfortunately a very sneaky one 
and it occurred accidentally during the refactoring of 
AvroSerializer/Deserializer hierarchy in Hudi.
   
   Crux of the issue is that converter initializes AvroSerializer/Deserializer 
upon _every_ invocation of it, b/c it's done w/in the returned lambda itself 
(it also has a side-effect of pulling whole `SparkAdapter` into the closure):
   
   ```
   def createAvroToInternalRowConverter(rootAvroType: Schema, rootCatalystType: 
StructType): GenericRecord => Option[InternalRow] =
       record => sparkAdapter.createAvroDeserializer(rootAvroType, 
rootCatalystType)
         .deserialize(record)
         .map(_.asInstanceOf[InternalRow])
   ```
   
   Instead it should have been
   ```
   def createAvroToInternalRowConverter(rootAvroType: Schema, rootCatalystType: 
StructType): GenericRecord => Option[InternalRow] = { 
     val deserilizer = sparkAdapter.createAvroDeserializer(rootAvroType, 
rootCatalystType) 
     record => 
       deserializer.deserialize(record).map(_.asInstanceOf[InternalRow]) }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to