alexeykudinkin commented on PR #6788:
URL: https://github.com/apache/hudi/pull/6788#issuecomment-1258943919

   @YannByron that's exactly the problem: 
   ```
   val row = deserializer.deserialize(originalAvroRecord).get
   val row2 = deserializer.deserialize(originalAvroRecord2).get // deserialize 
originalAvroRecord2
   assert(row != row2) // without this pr, row and row2 are the same object.
   ```
   
   You're retaining reference to an internal Row object returned by 
`AvroDeserializer`. This is something Spark is unequivocally clear about: if 
you want to retain a reference to a row you _have to_ copy it, since row might 
be stored in reusable buffer (not copying it you might be holding a reference 
to a buffer that will be overwritten by subsequent invocation)
   
   This is unfortunately might be a mistake that is too easy to make w/ Spark, 
but harder to trace. We just need to be vigilant about it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to