alexeykudinkin commented on code in PR #6805:
URL: https://github.com/apache/hudi/pull/6805#discussion_r980715777
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/HoodieCDCRDD.scala:
##########
@@ -318,7 +318,7 @@ class HoodieCDCRDD(
val after = record.get(3).asInstanceOf[GenericRecord]
recordToLoad.update(3,
convertToUTF8String(HoodieCDCUtils.recordToJson(after)))
case HoodieCDCSupplementalLoggingMode.WITH_BEFORE =>
- val row =
cdcRecordDeserializer.deserialize(record).get.asInstanceOf[InternalRow]
+ val row = cdcRecordDeserialize(record)
Review Comment:
We actually don't need to copy it here right away: if someone just iterates
over these records and simply writes them out into a file there would be no
issue (and there's no need to make additional copies).
However if we have a place where we do retain reference after iteration,
like:
```
val rows = iter.collect()
```
Then we will need to make copies before we do that, like below:
```
val copiedRows = iter.map(_.copy()).collect()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]