[GitHub] [hudi] Ambarish-Giri commented on issue #3605: [SUPPORT]Hudi Inserts and Upserts for MoR and CoW tables are taking very long time.

GitBox Mon, 13 Sep 2021 05:32:02 -0700


Ambarish-Giri commented on issue #3605:
URL: https://github.com/apache/hudi/issues/3605#issuecomment-918144599



   Hi @nsivabalan,
   
   I have tried changing the index type to Simple Index as well and below are 
my upsert and bulk-insert configurations respectively:
   Upsert
   ------
   
   userSegDf.write
         .format("hudi")
         .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
          .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
keyGenClass)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, key)
         .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, 
partitionKey)
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, combineKey)
         .option(HoodieWriteConfig.TABLE_NAME, tableName)
         
.option(HoodieIndexConfig.INDEX_TYPE_PROP,HoodieIndex.IndexType.SIMPLE.toString())
         .option(HoodieIndexConfig.SIMPLE_INDEX_PARALLELISM_PROP,200)
         .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
         .option(DataSourceWriteOptions.ENABLE_ROW_WRITER_OPT_KEY, true)
         .option(HoodieWriteConfig.UPSERT_PARALLELISM, customNumPartitions)
         .option(HoodieWriteConfig.COMBINE_BEFORE_UPSERT_PROP, false)
         .option(HoodieWriteConfig.WRITE_BUFFER_LIMIT_BYTES, 41943040)
         
.option(HoodieCompactionConfig.COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE, 100)
         .option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY, true)
         .mode(SaveMode.Append)
         .save(s"$basePath/$tableName/")
   
   Bulk-Insert :
   ------------
   userSegDf.write
         .format("hudi")
         .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
         .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, keyGenClass)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, key)
         .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, 
partitionKey)
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, combineKey)
         .option(HoodieWriteConfig.TABLE_NAME, tableName)
         
.option(HoodieIndexConfig.INDEX_TYPE_PROP,HoodieIndex.IndexType.SIMPLE.toString())
         .option(HoodieIndexConfig.SIMPLE_INDEX_PARALLELISM_PROP,200)
         .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
         .option(DataSourceWriteOptions.ENABLE_ROW_WRITER_OPT_KEY, true)
         .option(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP, false)
         .option(HoodieWriteConfig.WRITE_BUFFER_LIMIT_BYTES, 41943040)
         
.option(HoodieCompactionConfig.COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE, 100)
         .option(HoodieWriteConfig.BULKINSERT_SORT_MODE, 
BulkInsertSortMode.NONE.toString())
          .option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY, true)
         .mode(SaveMode.Overwrite)
         .save(s"$basePath/$tableName/")
   
   Using simple Index helped a bit but now the below stage is running for more 
than 2 hrs, though it is progressing but very slowly :
   
   
https://github.com/apache/hudi/blob/3e71c915271d77c7306ca0325b212f71ce723fc0/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L154
   
   Let me know in case any more details are required.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Ambarish-Giri commented on issue #3605: [SUPPORT]Hudi Inserts and Upserts for MoR and CoW tables are taking very long time.

Reply via email to