alexeykudinkin commented on code in PR #7413:
URL: https://github.com/apache/hudi/pull/7413#discussion_r1055905173


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/BulkInsertPartitioner.java:
##########
@@ -72,4 +79,36 @@ default Option<WriteHandleFactory> getWriteHandleFactory(int 
partitionId) {
     return Option.empty();
   }
 
+  /*
+   * If possible, we want to sort the data by partition path. Doing so will 
reduce the number of files written.
+   **/
+  static String[] prependPartitionPathColumn(String[] columnNames, 
HoodieWriteConfig config) {
+    ArrayList<String> sortCols = new ArrayList<>();

Review Comment:
   We can keep this simpler: 
   
   ```
   if (meta-fields) {
     partitionColumns = 
Arrays.stream(HoodieRecord.HoodieMetadataField.PARTITION_PATH_METADATA_FIELD)
   } else {
     partitionColumns = Arrays.stream(partitionPath.split(","));
   }
   
   // handle it
   ```



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##########
@@ -43,14 +43,14 @@
 
   public RDDCustomColumnsSortPartitioner(HoodieWriteConfig config) {
     this.serializableSchema = new SerializableSchema(new 
Schema.Parser().parse(config.getSchema()));
-    this.sortColumnNames = getSortColumnName(config);
+    this.sortColumnNames = 
BulkInsertPartitioner.prependPartitionPathColumn(getSortColumnName(config), 
config);

Review Comment:
   It will be pre-empted by meta-fields conditional (if these are enabled)
   
   We can add that check into `prependPartitionPathColumn` itself, but in that 
case let's rename it accordingly to  make such contingency clear: 
`tryPrependPartitionColumns`



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/BulkInsertPartitioner.java:
##########
@@ -72,4 +79,37 @@ default Option<WriteHandleFactory> getWriteHandleFactory(int 
partitionId) {
     return Option.empty();
   }
 
+  /*
+   * If possible, we want to sort the data by partition path. Doing so will 
reduce the number of files written.

Review Comment:
   Sorry, that was a typo, i meant "partitioned tables"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to