[I] [SUPPORT] Hoodie Insert operation failing while appending to record index log file [hudi]

via GitHub Fri, 22 Nov 2024 16:48:41 -0800


dataproblems opened a new issue, #12320:
URL: https://github.com/apache/hudi/issues/12320


   **Describe the problem you faced**
   
   I'm creating a table using INSERT mode with record level index. I see that 
the data and the partitions are written to s3 but then while appending records 
to the record index log my job fails. 
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. spark.write.format("hudi").options(...).save("...")
   
   **Expected behavior**
   
   I should be able to create the record level index 
   
   **Environment Description**
   
   * Hudi version : 0.15.0
   
   * Spark version : 3.4 
   
   * Hive version : N/A
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   ### Hoodie options
   
   ```
   DataSourceWriteOptions.TABLE_TYPE.key() -> 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, 
       HoodieStorageConfig.PARQUET_COMPRESSION_CODEC_NAME.key() -> "snappy", 
       HoodieStorageConfig.PARQUET_MAX_FILE_SIZE
         .key() -> "2147483648",
       "hoodie.parquet.small.file.limit" -> "1073741824",
       HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key() -> "true", 
       HoodieIndexConfig.INDEX_TYPE.key() -> "RECORD_INDEX", 
       "hoodie.metadata.enable" -> "true", 
       "hoodie.datasource.write.hive_style_partitioning" -> "true", 
       "hoodie.metadata.record.index.enable" -> "true", 
       HoodieTableConfig.POPULATE_META_FIELDS.key() -> "true", 
       HoodieWriteConfig.MARKERS_TYPE.key() -> "DIRECT",
       DataSourceWriteOptions.OPERATION.key() -> 
DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, /
       "hoodie.metadata.record.index.max.filegroup.count" -> "100000",
      "hoodie.metadata.record.index.min.filegroup.count" -> "7500" // I have 
10ish TB of data and trying to keep the record index log files to be around 400 
MB each. 
     ) 
   ```
   
   
   **Stacktrace**
   
   ```
   Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while 
appending records to 
s3://SomeS3Path/.hoodie/metadata/record_index/.record-index-0195-0_00000000000000012.log.2_912-39-236765
        at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:466)
        at 
org.apache.hudi.io.HoodieAppendHandle.flushToDiskIfRequired(HoodieAppendHandle.java:599)
        at 
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:428)
        at 
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90)
        at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:337)
        ... 29 more
   Caused by: org.apache.hudi.exception.HoodieIOException: IOException 
serializing records
        at 
org.apache.hudi.common.util.HFileUtils.lambda$serializeRecordsToLogBlock$0(HFileUtils.java:219)
        at java.util.TreeMap.forEach(TreeMap.java:1005)
        at 
org.apache.hudi.common.util.HFileUtils.serializeRecordsToLogBlock(HFileUtils.java:213)
        at 
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:108)
        at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:117)
        at 
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:163)
        at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:458)
        ... 33 more
   Caused by: java.io.IOException: Added a key not lexically larger than 
previous.
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [SUPPORT] Hoodie Insert operation failing while appending to record index log file [hudi]

Reply via email to