yihua commented on code in PR #6028:
URL: https://github.com/apache/hudi/pull/6028#discussion_r915125780
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -523,17 +523,14 @@ object HoodieSparkSqlWriter {
val params: mutable.Map[String, String] =
collection.mutable.Map(parameters.toSeq: _*)
params(HoodieWriteConfig.AVRO_SCHEMA_STRING.key) = schema.toString
val writeConfig = DataSourceUtils.createHoodieConfig(schema.toString,
path, tblName, mapAsJavaMap(params))
- val bulkInsertPartitionerRows: BulkInsertPartitioner[Dataset[Row]] = if
(populateMetaFields) {
+ val bulkInsertPartitionerRows: BulkInsertPartitioner[Dataset[Row]] = {
Review Comment:
We have to keep the if-else branch here based on the reason above.
##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieDatasetBulkInsertHelper.java:
##########
@@ -182,8 +184,8 @@ public static Dataset<Row>
prepareHoodieDatasetForBulkInsertWithoutMetaFields(Da
allCols.addAll(metaFields);
allCols.addAll(originalFields);
- return rowsWithMetaCols.select(
-
JavaConverters.collectionAsScalaIterableConverter(allCols).asScala().toSeq());
+ return
bulkInsertPartitionerRows.repartitionRecords(rowsWithMetaCols.select(
Review Comment:
Some of the partitioners rely on the meta fields for sorting so if the meta
fields are not present, the repartitioning or sorting fails.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]