alexeykudinkin commented on code in PR #7413:
URL: https://github.com/apache/hudi/pull/7413#discussion_r1043972511
##########
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaCustomColumnsSortPartitioner.java:
##########
@@ -40,8 +42,11 @@
private final String[] sortColumnNames;
private final Schema schema;
private final boolean consistentLogicalTimestampEnabled;
+ private final boolean isSorted;
- public JavaCustomColumnsSortPartitioner(String[] columnNames, Schema schema,
boolean consistentLogicalTimestampEnabled) {
Review Comment:
`consistentLogicalTimestampEnabled` could also be fetched from
`HoodieWriteConfig`
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##########
@@ -42,14 +43,20 @@
private final String[] sortColumnNames;
private final SerializableSchema serializableSchema;
private final boolean consistentLogicalTimestampEnabled;
+ private final boolean isSorted;
Review Comment:
Same comment as above
##########
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaCustomColumnsSortPartitioner.java:
##########
@@ -40,8 +42,11 @@
private final String[] sortColumnNames;
private final Schema schema;
private final boolean consistentLogicalTimestampEnabled;
+ private final boolean isSorted;
Review Comment:
Let's clarify the naming `arePartitionPathSorted`
##########
hudi-client/hudi-java-client/src/test/java/org/apache/hudi/execution/bulkinsert/TestJavaBulkInsertInternalPartitioner.java:
##########
@@ -63,9 +65,11 @@ public void testCustomColumnSortPartitioner(String
sortColumnString) throws Exce
getCustomColumnComparator(HoodieTestDataGenerator.AVRO_SCHEMA,
sortColumns);
List<HoodieRecord> records = generateTestRecordsForBulkInsert(1000);
+ HoodieWriteConfig cfg =
HoodieWriteConfig.newBuilder().withPath("basePath").build();
+ cfg.setValue(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME,
"partition_path");
testBulkInsertInternalPartitioner(
- new JavaCustomColumnsSortPartitioner(sortColumns,
HoodieTestDataGenerator.AVRO_SCHEMA, false),
- records, true, generatePartitionNumRecords(records),
Option.of(columnComparator));
+ new JavaCustomColumnsSortPartitioner(sortColumns,
HoodieTestDataGenerator.AVRO_SCHEMA, false, cfg),
+ records, false, generatePartitionNumRecords(records),
Option.of(columnComparator));
Review Comment:
Why did we flip this one to false?
##########
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaCustomColumnsSortPartitioner.java:
##########
@@ -40,8 +42,11 @@
private final String[] sortColumnNames;
private final Schema schema;
private final boolean consistentLogicalTimestampEnabled;
+ private final boolean isSorted;
- public JavaCustomColumnsSortPartitioner(String[] columnNames, Schema schema,
boolean consistentLogicalTimestampEnabled) {
+ public JavaCustomColumnsSortPartitioner(String[] columnNames, Schema schema,
boolean consistentLogicalTimestampEnabled, HoodieWriteConfig config) {
+ String partitionPath =
config.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME);
Review Comment:
This could actually be multiple columns. Let's instead create a KeyGenerator
and fetch `getPartitionPathFields` from it. You can create it using
`HoodieSparkKeyGeneratorFactory.createKeyGenerator`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]