Ambarish-Giri commented on issue #3605:
URL: https://github.com/apache/hudi/issues/3605#issuecomment-918144599


   Hi @nsivabalan,
   
   I have tried changing the index type to Simple Index as well and below are 
my upsert and bulk-insert configurations respectively:
   Upsert
   ------
   
   userSegDf.write
         .format("hudi")
         .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
          .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
keyGenClass)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, key)
         .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, 
partitionKey)
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, combineKey)
         .option(HoodieWriteConfig.TABLE_NAME, tableName)
         
.option(HoodieIndexConfig.INDEX_TYPE_PROP,HoodieIndex.IndexType.SIMPLE.toString())
         .option(HoodieIndexConfig.SIMPLE_INDEX_PARALLELISM_PROP,200)
         .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
         .option(DataSourceWriteOptions.ENABLE_ROW_WRITER_OPT_KEY, true)
         .option(HoodieWriteConfig.UPSERT_PARALLELISM, customNumPartitions)
         .option(HoodieWriteConfig.COMBINE_BEFORE_UPSERT_PROP, false)
         .option(HoodieWriteConfig.WRITE_BUFFER_LIMIT_BYTES, 41943040)
         
.option(HoodieCompactionConfig.COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE, 100)
         .option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY, true)
         .mode(SaveMode.Append)
         .save(s"$basePath/$tableName/")
   
   Bulk-Insert :
   ------------
   userSegDf.write
         .format("hudi")
         .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
         .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, keyGenClass)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, key)
         .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, 
partitionKey)
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, combineKey)
         .option(HoodieWriteConfig.TABLE_NAME, tableName)
         
.option(HoodieIndexConfig.INDEX_TYPE_PROP,HoodieIndex.IndexType.SIMPLE.toString())
         .option(HoodieIndexConfig.SIMPLE_INDEX_PARALLELISM_PROP,200)
         .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
         .option(DataSourceWriteOptions.ENABLE_ROW_WRITER_OPT_KEY, true)
         .option(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP, false)
         .option(HoodieWriteConfig.WRITE_BUFFER_LIMIT_BYTES, 41943040)
         
.option(HoodieCompactionConfig.COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE, 100)
         .option(HoodieWriteConfig.BULKINSERT_SORT_MODE, 
BulkInsertSortMode.NONE.toString())
          .option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY, true)
         .mode(SaveMode.Overwrite)
         .save(s"$basePath/$tableName/")
   
   Using simple Index helped a bit but now the below stage is running for more 
than 2 hrs, though it is progressing but very slowly :
   
   
https://github.com/apache/hudi/blob/3e71c915271d77c7306ca0325b212f71ce723fc0/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L154
   
   Let me know in case any more details are required.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to