[GitHub] [hudi] nsivabalan commented on a diff in pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

GitBox Thu, 28 Apr 2022 15:40:17 -0700


nsivabalan commented on code in PR #5462:
URL: https://github.com/apache/hudi/pull/5462#discussion_r861359728



##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieDatasetBulkInsertHelper.java:
##########
@@ -57,18 +61,18 @@ public class HoodieDatasetBulkInsertHelper {
 
   /**
    * Prepares input hoodie spark dataset for bulk insert. It does the 
following steps.
-   *  1. Uses KeyGenerator to generate hoodie record keys and partition path.
-   *  2. Add hoodie columns to input spark dataset.
-   *  3. Reorders input dataset columns so that hoodie columns appear in the 
beginning.
-   *  4. Sorts input dataset by hoodie partition path and record key
+   * 1. Uses KeyGenerator to generate hoodie record keys and partition path.
+   * 2. Add hoodie columns to input spark dataset.
+   * 3. Reorders input dataset columns so that hoodie columns appear in the 
beginning.
+   * 4. Sorts input dataset by hoodie partition path and record key
    *
    * @param sqlContext SQL Context
-   * @param config Hoodie Write Config
-   * @param rows Spark Input dataset
+   * @param config     Hoodie Write Config
+   * @param rows       Spark Input dataset
    * @return hoodie dataset which is ready for bulk insert.
    */
   public static Dataset<Row> prepareHoodieDatasetForBulkInsert(SQLContext 
sqlContext,
-      HoodieWriteConfig config, Dataset<Row> rows, String structName, String 
recordNamespace,
+                                                               
HoodieWriteConfig config, Dataset<Row> rows, String structName, String 
recordNamespace,

Review Comment:
   yeah, I get it. I wanna share my patch w/ a user who was having perf issue 
and hence have added everything I had. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

Reply via email to