alexeykudinkin commented on code in PR #7411:
URL: https://github.com/apache/hudi/pull/7411#discussion_r1043794079
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/BulkInsertPartitioner.java:
##########
@@ -38,7 +38,7 @@
* @param outputPartitions Expected number of output partitions
* @return
*/
- I repartitionRecords(I records, int outputPartitions);
+ I repartitionRecords(I records, int outputPartitions, boolean
populateMetaFields);
Review Comment:
@codope we should pass this in ctor (we actually should be passing the
HoodieWriteConfig to access whole config), rather than t/h the API
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/GlobalSortPartitionerWithRows.java:
##########
@@ -31,11 +31,14 @@
public class GlobalSortPartitionerWithRows implements
BulkInsertPartitioner<Dataset<Row>> {
@Override
- public Dataset<Row> repartitionRecords(Dataset<Row> rows, int
outputSparkPartitions) {
+ public Dataset<Row> repartitionRecords(Dataset<Row> rows, int
outputSparkPartitions, boolean populateMetaFields) {
// Now, sort the records and line them up nicely for loading.
// Let's use "partitionPath + key" as the sort key.
- return
rows.sort(functions.col(HoodieRecord.PARTITION_PATH_METADATA_FIELD),
functions.col(HoodieRecord.RECORD_KEY_METADATA_FIELD))
- .coalesce(outputSparkPartitions);
+ if (populateMetaFields) {
Review Comment:
A few notes:
- We should actually properly support virtual-keys in here by instantiating
the key-gen and applying it here
- Current approach is actually obscuring the failure currently -- instead
of failing, we'll let it go t/h but we'll breach the contract of the
Partitioner by not sorting the rows actually. Instead, we should just fail
outright in case meta-fields are disabled (as a stop-gap solution until we
fully support virtual-keys).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]