alexeykudinkin commented on code in PR #7003:
URL: https://github.com/apache/hudi/pull/7003#discussion_r1034075381
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/commmon/model/HoodieSparkRecord.java:
##########
@@ -183,11 +182,13 @@ public HoodieRecord rewriteRecord(Schema recordSchema,
Properties props, Schema
StructType structType =
HoodieInternalRowUtils.getCachedSchema(recordSchema);
StructType targetStructType =
HoodieInternalRowUtils.getCachedSchema(targetSchema);
- boolean containMetaFields = hasMetaFields(structType);
- UTF8String[] metaFields = tryExtractMetaFields(data, structType);
+ InternalRow rewriteRecord =
HoodieInternalRowUtils.rewriteRecord(this.data, structType, targetStructType);
+ UnsafeRow unsafeRow =
HoodieInternalRowUtils.getCachedUnsafeProjection(targetStructType,
targetStructType).apply(rewriteRecord);
- // TODO add actual rewriting
- InternalRow finalRow = new HoodieInternalRow(metaFields, data,
containMetaFields);
+ boolean containMetaFields = hasMetaFields(targetStructType);
+ UTF8String[] metaFields = tryExtractMetaFields(unsafeRow,
targetStructType);
+ HoodieInternalRow internalRow = new HoodieInternalRow(metaFields,
unsafeRow, containMetaFields);
+ InternalRow finalRow = copy ? internalRow.copy() : internalRow;
Review Comment:
@wzx140 understand your point. Here's my take: remember our discussion
regarding making copies in the ctor? Same logic applies here:
- We want to do as few copies as possible (less copying -> less compute
necessary to handle the record)
- To make as few copies as possible copying should be lazy
- We can't reason about whether the copy is warranted w/o knowing the
context of how the record will be used (ie, would we need to retain the
reference to the row?)
Based on that, in this case since we're doing the `UnsafeProjection` we will
be pointing into the shared buffer of the projection itself. However, `rewrite`
operations in Hudi are most often done right before record is written out into
the file, meaning that there's simply no need to make a copy of it (since we're
not going to hold any references on it)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]