nsivabalan commented on code in PR #5462:
URL: https://github.com/apache/hudi/pull/5462#discussion_r861359728
##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieDatasetBulkInsertHelper.java:
##########
@@ -57,18 +61,18 @@ public class HoodieDatasetBulkInsertHelper {
/**
* Prepares input hoodie spark dataset for bulk insert. It does the
following steps.
- * 1. Uses KeyGenerator to generate hoodie record keys and partition path.
- * 2. Add hoodie columns to input spark dataset.
- * 3. Reorders input dataset columns so that hoodie columns appear in the
beginning.
- * 4. Sorts input dataset by hoodie partition path and record key
+ * 1. Uses KeyGenerator to generate hoodie record keys and partition path.
+ * 2. Add hoodie columns to input spark dataset.
+ * 3. Reorders input dataset columns so that hoodie columns appear in the
beginning.
+ * 4. Sorts input dataset by hoodie partition path and record key
*
* @param sqlContext SQL Context
- * @param config Hoodie Write Config
- * @param rows Spark Input dataset
+ * @param config Hoodie Write Config
+ * @param rows Spark Input dataset
* @return hoodie dataset which is ready for bulk insert.
*/
public static Dataset<Row> prepareHoodieDatasetForBulkInsert(SQLContext
sqlContext,
- HoodieWriteConfig config, Dataset<Row> rows, String structName, String
recordNamespace,
+
HoodieWriteConfig config, Dataset<Row> rows, String structName, String
recordNamespace,
Review Comment:
yeah, I get it. I wanna share my patch w/ a user who was having perf issue
and hence have added everything I had.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]