[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2106: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

GitBox Thu, 22 Oct 2020 00:50:49 -0700


Karl-WangSK commented on a change in pull request #2106:
URL: https://github.com/apache/hudi/pull/2106#discussion_r509950306




##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkWriteHelper.java
##########
@@ -58,7 +60,8 @@ public static SparkWriteHelper newInstance() {
       return new Tuple2<>(key, record);
     }).reduceByKey((rec1, rec2) -> {
       @SuppressWarnings("unchecked")
-      T reducedData = (T) rec1.getData().preCombine(rec2.getData());
+      T reducedData = schema != null && !schema.get().isEmpty() ? (T) 
rec1.getData().preCombine(rec2.getData(), new 
Schema.Parser().parse(schema.get()))

Review comment:
       Um.. but we add this in `HoodieRecordPayload`.
   ```
   default T preCombine(T another, Schema schema) throws IOException {
       return preCombine(another);
     }
   ```
   which means the old payload classes will just ignore schema(because they 
don't need). Only the classes that I just added in this PR will  take advantage 
of it.
   But one problem is that all payload will parse the schema every record 
whether it needs or not.It will affect performance.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Karl-WangSK commented on a change in pull request #2106: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

Reply via email to