prashantwason opened a new issue, #18060:
URL: https://github.com/apache/hudi/issues/18060
## Problem Description
When records have null values in the precombine field, Hudi jobs fail with a
cryptic error message that makes it difficult for users to diagnose the root
cause:
```
org.apache.hudi.exception.HoodieException: Could not create payload for
class: org.apache.hudi.common.model.DefaultHoodieRecordPayload
Caused by: org.apache.hudi.exception.HoodieException: Ordering value is null
for record: ...
```
This error provides no actionable information about:
- Which precombine field has the null value
- Which record is problematic (record key)
- How to remediate the issue
## Root Cause
`BaseAvroPayload`'s constructor requires a non-null `orderingVal` parameter.
When records have null values in the precombine field,
`HoodieAvroUtils.getNestedFieldVal()` returns null, which causes payload
instantiation to fail with the confusing error message above.
The relevant code path in `HoodieCreateRecordUtils.scala`:
```scala
val hoodieRecord = if (shouldCombine && !orderingFields.isEmpty) {
val orderingVal = OrderingValues.create(
orderingFields,
JFunction.toJavaFunction[String, Comparable[_]](
field => HoodieAvroUtils.getNestedFieldVal(avroRec, field, false,
consistentLogicalTimestampEnabled).asInstanceOf[Comparable[_]]))
// ... creates payload which fails if orderingVal contains null
}
```
## Proposed Solution
Add explicit null-check with a clear, actionable error message before
attempting payload creation. The new error message should:
- Identify the specific precombine field that has a null value
- Provide the record key to help locate the problematic record
- Suggest remediation options (fix data or use a different payload class
like `OverwriteWithLatestAvroPayload`)
Example improved error message:
```
Precombine field 'ts' has null value for record key 'abc123'. Please ensure
all records have non-null values for the precombine field, or use a payload
class that doesn't require ordering (e.g., OverwriteWithLatestAvroPayload).
```
## Affected Components
- Spark: `HoodieCreateRecordUtils.scala`
- Flink: Payload creation utilities
## Impact
This is a usability improvement that helps users quickly diagnose and fix
data quality issues in their ingestion pipelines.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]