codope commented on code in PR #7038:
URL: https://github.com/apache/hudi/pull/7038#discussion_r1012548734
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java:
##########
@@ -499,7 +499,7 @@ private Pair<SchemaProvider, Pair<String,
JavaRDD<HoodieRecord>>> fetchFromSourc
return new HoodieAvroRecord<>(keyGenerator.getKey(record), payload);
});
- return Pair.of(schemaProvider, Pair.of(checkpointStr, records));
+ return new ReadResult(schemaProvider, checkpointStr, records, false);
Review Comment:
Can we not omit `avroRDDOptional.get().isEmpty()` check in L484 by pulling
up `DataSourceUtils.dropDuplicates` call in this method? Then we can check
`isEmpty` at this L502 while instantiating `ReadResult`. It will still be just
one `isEmpty` call.
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java:
##########
@@ -962,4 +964,34 @@ private Set<String> getPartitionColumns(KeyGenerator
keyGenerator, TypedProperti
String partitionColumns =
SparkKeyGenUtils.getPartitionColumns(keyGenerator, props);
return
Arrays.stream(partitionColumns.split(",")).collect(Collectors.toSet());
}
+
+ class ReadResult {
Review Comment:
+1 for abstracting out this model.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]