alexeykudinkin commented on issue #5107:
URL: https://github.com/apache/hudi/issues/5107#issuecomment-1079734532
Thanks for flagging this @YuweiXiao, great catch!
To summarize what the issue is here: it is unfortunately a very sneaky one
and it occurred accidentally during the refactoring of
AvroSerializer/Deserializer hierarchy in Hudi.
Crux of the issue is that converter initializes AvroSerializer/Deserializer
upon _every_ invocation of it, b/c it's done w/in the returned lambda itself
(it also has a side-effect of pulling whole `SparkAdapter` into the closure):
```
def createAvroToInternalRowConverter(rootAvroType: Schema, rootCatalystType:
StructType): GenericRecord => Option[InternalRow] =
record => sparkAdapter.createAvroDeserializer(rootAvroType,
rootCatalystType)
.deserialize(record)
.map(_.asInstanceOf[InternalRow])
```
Instead it should have been
```
def createAvroToInternalRowConverter(rootAvroType: Schema, rootCatalystType:
StructType): GenericRecord => Option[InternalRow] = {
val deserilizer = sparkAdapter.createAvroDeserializer(rootAvroType,
rootCatalystType)
record =>
deserializer.deserialize(record).map(_.asInstanceOf[InternalRow]) }
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]