jonvex commented on code in PR #7413:
URL: https://github.com/apache/hudi/pull/7413#discussion_r1044633437
##########
hudi-client/hudi-java-client/src/test/java/org/apache/hudi/execution/bulkinsert/TestJavaBulkInsertInternalPartitioner.java:
##########
@@ -63,9 +65,11 @@ public void testCustomColumnSortPartitioner(String
sortColumnString) throws Exce
getCustomColumnComparator(HoodieTestDataGenerator.AVRO_SCHEMA,
sortColumns);
List<HoodieRecord> records = generateTestRecordsForBulkInsert(1000);
+ HoodieWriteConfig cfg =
HoodieWriteConfig.newBuilder().withPath("basePath").build();
+ cfg.setValue(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME,
"partition_path");
testBulkInsertInternalPartitioner(
- new JavaCustomColumnsSortPartitioner(sortColumns,
HoodieTestDataGenerator.AVRO_SCHEMA, false),
- records, true, generatePartitionNumRecords(records),
Option.of(columnComparator));
+ new JavaCustomColumnsSortPartitioner(sortColumns,
HoodieTestDataGenerator.AVRO_SCHEMA, false, cfg),
+ records, false, generatePartitionNumRecords(records),
Option.of(columnComparator));
Review Comment:
So this is one of the tests I was concerned about. I looked into the data
generator, and TRIP_EXAMPLE_SCHEMA the partition path is "partition_path" but
the sortColumnString is set to "rider" and "rider,driver". So the partitioner's
arePartitionRecordsSorted method will return false. So that's why I changed it.
Looking at this test with fresh eyes today, I don't think this test is useful
currently. The test is using HoodieAvroUtils.getRecordColumnValues as the sort
key and doing Collections.sort and then comparing with the output of
partitioner.repartitionRecords.
JavaCustomColumnsSortPartitioner.repartitonRecords is doing essentially the
same thing, so if there is something wrong with our partitioner then I don't
think this test would fail. To make this a good test I think we need to have a
list of records that we know are sorted correctly, and then we shuffle the list
and then when we repartition that we can compare with the original list that we
know is correct.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]