jonvex commented on code in PR #7413:
URL: https://github.com/apache/hudi/pull/7413#discussion_r1044633437


##########
hudi-client/hudi-java-client/src/test/java/org/apache/hudi/execution/bulkinsert/TestJavaBulkInsertInternalPartitioner.java:
##########
@@ -63,9 +65,11 @@ public void testCustomColumnSortPartitioner(String 
sortColumnString) throws Exce
         getCustomColumnComparator(HoodieTestDataGenerator.AVRO_SCHEMA, 
sortColumns);
 
     List<HoodieRecord> records = generateTestRecordsForBulkInsert(1000);
+    HoodieWriteConfig cfg = 
HoodieWriteConfig.newBuilder().withPath("basePath").build();
+    cfg.setValue(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME, 
"partition_path");
     testBulkInsertInternalPartitioner(
-        new JavaCustomColumnsSortPartitioner(sortColumns, 
HoodieTestDataGenerator.AVRO_SCHEMA, false),
-        records, true, generatePartitionNumRecords(records), 
Option.of(columnComparator));
+        new JavaCustomColumnsSortPartitioner(sortColumns, 
HoodieTestDataGenerator.AVRO_SCHEMA, false, cfg),
+        records, false, generatePartitionNumRecords(records), 
Option.of(columnComparator));

Review Comment:
   So this is one of the tests I was concerned about. I looked into the data 
generator, and TRIP_EXAMPLE_SCHEMA the partition path is "partition_path" but 
the sortColumnString is set to "rider" and "rider,driver". So the partitioner's 
arePartitionRecordsSorted method will return false. So that's why I changed it. 
Looking at this test with fresh eyes today, I don't think this test is useful 
currently. The test is using HoodieAvroUtils.getRecordColumnValues as the sort 
key and doing Collections.sort and then comparing with the output of 
partitioner.repartitionRecords. 
JavaCustomColumnsSortPartitioner.repartitonRecords is doing essentially the 
same thing, so if there is something wrong with our partitioner then I don't 
think this test would fail. To make this a good test I think we need to have a 
list of records that we know are sorted correctly, and then we shuffle the 
list. After we repartition the shuffled list we can compare it with the 
original list that we know is correct. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to