alexeykudinkin commented on PR #6788: URL: https://github.com/apache/hudi/pull/6788#issuecomment-1258943919
@YannByron that's exactly the problem: ``` val row = deserializer.deserialize(originalAvroRecord).get val row2 = deserializer.deserialize(originalAvroRecord2).get // deserialize originalAvroRecord2 assert(row != row2) // without this pr, row and row2 are the same object. ``` You're retaining reference to an internal Row object returned by `AvroDeserializer`. This is something Spark is unequivocally clear about: if you want to retain a reference to a row you _have to_ copy it, since row might be stored in reusable buffer (not copying it you might be holding a reference to a buffer that will be overwritten by subsequent invocation) This is unfortunately might be a mistake that is too easy to make w/ Spark, but harder to trace. We just need to be vigilant about it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
