alexeykudinkin commented on code in PR #7411:
URL: https://github.com/apache/hudi/pull/7411#discussion_r1043794079


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/BulkInsertPartitioner.java:
##########
@@ -38,7 +38,7 @@
    * @param outputPartitions Expected number of output partitions
    * @return
    */
-  I repartitionRecords(I records, int outputPartitions);
+  I repartitionRecords(I records, int outputPartitions, boolean 
populateMetaFields);

Review Comment:
   @codope we should pass this in ctor (we actually should be passing the 
HoodieWriteConfig to access whole config), rather than t/h the API



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/GlobalSortPartitionerWithRows.java:
##########
@@ -31,11 +31,14 @@
 public class GlobalSortPartitionerWithRows implements 
BulkInsertPartitioner<Dataset<Row>> {
 
   @Override
-  public Dataset<Row> repartitionRecords(Dataset<Row> rows, int 
outputSparkPartitions) {
+  public Dataset<Row> repartitionRecords(Dataset<Row> rows, int 
outputSparkPartitions, boolean populateMetaFields) {
     // Now, sort the records and line them up nicely for loading.
     // Let's use "partitionPath + key" as the sort key.
-    return 
rows.sort(functions.col(HoodieRecord.PARTITION_PATH_METADATA_FIELD), 
functions.col(HoodieRecord.RECORD_KEY_METADATA_FIELD))
-        .coalesce(outputSparkPartitions);
+    if (populateMetaFields) {

Review Comment:
   
   A few notes:
   
    - We should actually properly support virtual-keys in here by instantiating 
the key-gen and applying it here
    - Current approach is actually obscuring the failure currently -- instead 
of failing, we'll let it go t/h but we'll breach the contract of the 
Partitioner by not sorting the rows actually. Instead, we should just fail 
outright in case meta-fields are disabled (as a stop-gap solution until we 
fully support virtual-keys).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to