dataproblems opened a new issue, #12320:
URL: https://github.com/apache/hudi/issues/12320
**Describe the problem you faced**
I'm creating a table using INSERT mode with record level index. I see that
the data and the partitions are written to s3 but then while appending records
to the record index log my job fails.
**To Reproduce**
Steps to reproduce the behavior:
1. spark.write.format("hudi").options(...).save("...")
**Expected behavior**
I should be able to create the record level index
**Environment Description**
* Hudi version : 0.15.0
* Spark version : 3.4
* Hive version : N/A
* Hadoop version :
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) :
**Additional context**
### Hoodie options
```
DataSourceWriteOptions.TABLE_TYPE.key() ->
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL,
HoodieStorageConfig.PARQUET_COMPRESSION_CODEC_NAME.key() -> "snappy",
HoodieStorageConfig.PARQUET_MAX_FILE_SIZE
.key() -> "2147483648",
"hoodie.parquet.small.file.limit" -> "1073741824",
HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key() -> "true",
HoodieIndexConfig.INDEX_TYPE.key() -> "RECORD_INDEX",
"hoodie.metadata.enable" -> "true",
"hoodie.datasource.write.hive_style_partitioning" -> "true",
"hoodie.metadata.record.index.enable" -> "true",
HoodieTableConfig.POPULATE_META_FIELDS.key() -> "true",
HoodieWriteConfig.MARKERS_TYPE.key() -> "DIRECT",
DataSourceWriteOptions.OPERATION.key() ->
DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, /
"hoodie.metadata.record.index.max.filegroup.count" -> "100000",
"hoodie.metadata.record.index.min.filegroup.count" -> "7500" // I have
10ish TB of data and trying to keep the record index log files to be around 400
MB each.
)
```
**Stacktrace**
```
Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while
appending records to
s3://SomeS3Path/.hoodie/metadata/record_index/.record-index-0195-0_00000000000000012.log.2_912-39-236765
at
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:466)
at
org.apache.hudi.io.HoodieAppendHandle.flushToDiskIfRequired(HoodieAppendHandle.java:599)
at
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:428)
at
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:337)
... 29 more
Caused by: org.apache.hudi.exception.HoodieIOException: IOException
serializing records
at
org.apache.hudi.common.util.HFileUtils.lambda$serializeRecordsToLogBlock$0(HFileUtils.java:219)
at java.util.TreeMap.forEach(TreeMap.java:1005)
at
org.apache.hudi.common.util.HFileUtils.serializeRecordsToLogBlock(HFileUtils.java:213)
at
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:108)
at
org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:117)
at
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:163)
at
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:458)
... 33 more
Caused by: java.io.IOException: Added a key not lexically larger than
previous.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]