dataproblems opened a new issue, #12252:
URL: https://github.com/apache/hudi/issues/12252
**To Reproduce**
Steps to reproduce the behavior:
1. Create a table with Record level index using insert mode
2. Create a single row dataset and perform upsert
**Expected behavior**
The upsert operation should complete within a minute or two.
**Environment Description**
* Hudi version : 0.14.0
* Spark version : 3.4
* Hive version :
* Hadoop version :
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : no
**Additional context**
#### Table creation options
```
val insertOptions: Map[String, String] = Map(
DataSourceWriteOptions.OPERATION.key() ->
DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
DataSourceWriteOptions.TABLE_TYPE.key() ->
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL,
HoodieStorageConfig.PARQUET_COMPRESSION_CODEC_NAME.key() -> "snappy",
HoodieStorageConfig.PARQUET_MAX_FILE_SIZE.key() -> "2147483648",
"hoodie.parquet.small.file.limit" -> "1073741824",
HoodieTableConfig.POPULATE_META_FIELDS.key() -> "true",
HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key() -> "true",
HoodieIndexConfig.INDEX_TYPE.key() -> "RECORD_INDEX",
"hoodie.metadata.record.index.enable" -> "true",
"hoodie.metadata.enable" -> "true",
"hoodie.datasource.write.hive_style_partitioning" -> "true",
"hoodie.datasource.write.partitionpath.field" -> "SomePartitionField",
"hoodie.datasource.write.recordkey.field" -> "SomeRecordKey",
"hoodie.datasource.write.precombine.field" -> "SomeTimestampField",
"hoodie.table.name" -> "SomeTableName",
DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key() ->
classOf[SimpleKeyGenerator].getName,
"hoodie.write.markers.type" -> "DIRECT",
"hoodie.embed.timeline.server" -> "true",
"hoodie.metadata.record.index.min.filegroup.count" -> "500", // This was
data specific.
)
```
#### Spark UI: Stages

#### Stage Detail View

#### Completed task metrics

#### Commit file contents
I looked up a single record key from the table and created dummy record for
it to upsert and then carried out the upsert.
```
{
"partitionToWriteStats" : {
"REDACTED" : [ {
"fileId" : "REDACTED-0",
"path" : "REDACTED.parquet",
"cdcStats" : null,
"prevCommit" : "20241113220830932",
"numWrites" : 2098048,
"numDeletes" : 0,
"numUpdateWrites" : 1,
"numInserts" : 0,
"totalWriteBytes" : 241330130,
"totalWriteErrors" : 0,
"tempPath" : null,
"partitionPath" : "REDACTED",
"totalLogRecords" : 0,
"totalLogFilesCompacted" : 0,
"totalLogSizeCompacted" : 0,
"totalUpdatedRecordsCompacted" : 0,
"totalLogBlocks" : 0,
"totalCorruptLogBlock" : 0,
"totalRollbackBlocks" : 0,
"fileSizeInBytes" : 241330130,
"minEventTime" : null,
"maxEventTime" : null,
"runtimeStats" : {
"totalScanTime" : 0,
"totalUpsertTime" : 50020,
"totalCreateTime" : 0
}
} ]
},
"compacted" : false,
"extraMetadata" : {
"schema" : "REDACTED"
},
"operationType" : "UPSERT"
}%
```
Do you have an idea as to why this might be happening?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]