cdmikechen commented on a change in pull request #1073: [HUDI-377] Adding
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r361347283
##########
File path:
hudi-spark/src/main/java/org/apache/hudi/OverwriteWithLatestAvroPayload.java
##########
@@ -61,8 +60,15 @@ public OverwriteWithLatestAvroPayload
preCombine(OverwriteWithLatestAvroPayload
@Override
public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord
currentValue, Schema schema) throws IOException {
+
Review comment:
@vinothchandar
> Doing this in getInsertValue() means even inserts with the flag set will
be deleted.. Not sure if this is intended behavior.. We only want to delete if
updating and marker set?
If this is in a Kaapa architecture, it works. But if this is in a similar
Lambda architecture, data should be rebuilt sometimes, it may will get whole
data change logs by bulk insert.
Of course, this is just my assumption. Maybe our test cases haven't happen
at present. If I think too much, and in fact it can't be found in actual cases,
please ignore my review.
> do you have a performance concern here? `Option.of` should be very cheap
right.. In any case, we can achieve the effect of what you mean, by simply
hanging onto to the original Option[GenenricRecord]?
Yes, `Option.of` may new another object. I personally feel that if an
existing object already exists, unless there is a specific need, we should try
to use the original object instead of creating a new one.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services