Ambarish-Giri commented on issue #3605:
URL: https://github.com/apache/hudi/issues/3605#issuecomment-918144599
Hi @nsivabalan,
I have tried changing the index type to Simple Index as well and below are
my upsert and bulk-insert configurations respectively:
Upsert
------
userSegDf.write
.format("hudi")
.option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY,
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
.option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,
keyGenClass)
.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, key)
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,
partitionKey)
.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, combineKey)
.option(HoodieWriteConfig.TABLE_NAME, tableName)
.option(HoodieIndexConfig.INDEX_TYPE_PROP,HoodieIndex.IndexType.SIMPLE.toString())
.option(HoodieIndexConfig.SIMPLE_INDEX_PARALLELISM_PROP,200)
.option(DataSourceWriteOptions.OPERATION_OPT_KEY,
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
.option(DataSourceWriteOptions.ENABLE_ROW_WRITER_OPT_KEY, true)
.option(HoodieWriteConfig.UPSERT_PARALLELISM, customNumPartitions)
.option(HoodieWriteConfig.COMBINE_BEFORE_UPSERT_PROP, false)
.option(HoodieWriteConfig.WRITE_BUFFER_LIMIT_BYTES, 41943040)
.option(HoodieCompactionConfig.COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE, 100)
.option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY, true)
.mode(SaveMode.Append)
.save(s"$basePath/$tableName/")
Bulk-Insert :
------------
userSegDf.write
.format("hudi")
.option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY,
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
.option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, keyGenClass)
.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, key)
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,
partitionKey)
.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, combineKey)
.option(HoodieWriteConfig.TABLE_NAME, tableName)
.option(HoodieIndexConfig.INDEX_TYPE_PROP,HoodieIndex.IndexType.SIMPLE.toString())
.option(HoodieIndexConfig.SIMPLE_INDEX_PARALLELISM_PROP,200)
.option(DataSourceWriteOptions.OPERATION_OPT_KEY,
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
.option(DataSourceWriteOptions.ENABLE_ROW_WRITER_OPT_KEY, true)
.option(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP, false)
.option(HoodieWriteConfig.WRITE_BUFFER_LIMIT_BYTES, 41943040)
.option(HoodieCompactionConfig.COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE, 100)
.option(HoodieWriteConfig.BULKINSERT_SORT_MODE,
BulkInsertSortMode.NONE.toString())
.option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY, true)
.mode(SaveMode.Overwrite)
.save(s"$basePath/$tableName/")
Using simple Index helped a bit but now the below stage is running for more
than 2 hrs, though it is progressing but very slowly :
https://github.com/apache/hudi/blob/3e71c915271d77c7306ca0325b212f71ce723fc0/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L154
Let me know in case any more details are required.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]