prashantwason opened a new issue, #18060:
URL: https://github.com/apache/hudi/issues/18060

   ## Problem Description
   
   When records have null values in the precombine field, Hudi jobs fail with a 
cryptic error message that makes it difficult for users to diagnose the root 
cause:
   
   ```
   org.apache.hudi.exception.HoodieException: Could not create payload for 
class: org.apache.hudi.common.model.DefaultHoodieRecordPayload
   Caused by: org.apache.hudi.exception.HoodieException: Ordering value is null 
for record: ...
   ```
   
   This error provides no actionable information about:
   - Which precombine field has the null value
   - Which record is problematic (record key)
   - How to remediate the issue
   
   ## Root Cause
   
   `BaseAvroPayload`'s constructor requires a non-null `orderingVal` parameter. 
When records have null values in the precombine field, 
`HoodieAvroUtils.getNestedFieldVal()` returns null, which causes payload 
instantiation to fail with the confusing error message above.
   
   The relevant code path in `HoodieCreateRecordUtils.scala`:
   ```scala
   val hoodieRecord = if (shouldCombine && !orderingFields.isEmpty) {
     val orderingVal = OrderingValues.create(
       orderingFields,
       JFunction.toJavaFunction[String, Comparable[_]](
         field => HoodieAvroUtils.getNestedFieldVal(avroRec, field, false,
           consistentLogicalTimestampEnabled).asInstanceOf[Comparable[_]]))
     // ... creates payload which fails if orderingVal contains null
   }
   ```
   
   ## Proposed Solution
   
   Add explicit null-check with a clear, actionable error message before 
attempting payload creation. The new error message should:
   - Identify the specific precombine field that has a null value
   - Provide the record key to help locate the problematic record
   - Suggest remediation options (fix data or use a different payload class 
like `OverwriteWithLatestAvroPayload`)
   
   Example improved error message:
   ```
   Precombine field 'ts' has null value for record key 'abc123'. Please ensure 
all records have non-null values for the precombine field, or use a payload 
class that doesn't require ordering (e.g., OverwriteWithLatestAvroPayload).
   ```
   
   ## Affected Components
   - Spark: `HoodieCreateRecordUtils.scala`
   - Flink: Payload creation utilities
   
   ## Impact
   This is a usability improvement that helps users quickly diagnose and fix 
data quality issues in their ingestion pipelines.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to