Karl-WangSK commented on a change in pull request #2106:
URL: https://github.com/apache/hudi/pull/2106#discussion_r509950306
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkWriteHelper.java
##########
@@ -58,7 +60,8 @@ public static SparkWriteHelper newInstance() {
return new Tuple2<>(key, record);
}).reduceByKey((rec1, rec2) -> {
@SuppressWarnings("unchecked")
- T reducedData = (T) rec1.getData().preCombine(rec2.getData());
+ T reducedData = schema != null && !schema.get().isEmpty() ? (T)
rec1.getData().preCombine(rec2.getData(), new
Schema.Parser().parse(schema.get()))
Review comment:
Um.. but we add this in `HoodieRecordPayload`.
```
default T preCombine(T another, Schema schema) throws IOException {
return preCombine(another);
}
```
which means the old payload classes will just ignore schema(because they
don't need). Only the classes that I just added in this PR will take advantage
of it.
But one problem is that all payload will parse the schema every record
whether it needs or not.It will affect performance.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]