nsivabalan commented on code in PR #18061:
URL: https://github.com/apache/hudi/pull/18061#discussion_r2751769694


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieCreateRecordUtils.scala:
##########
@@ -153,8 +153,17 @@ object HoodieCreateRecordUtils {
               val orderingVal = OrderingValues.create(
                 orderingFields,
                 JFunction.toJavaFunction[String, Comparable[_]](
-                  field => HoodieAvroUtils.getNestedFieldVal(avroRec, field, 
false,
-                    
consistentLogicalTimestampEnabled).asInstanceOf[Comparable[_]]))
+                  field => {
+                    val fieldVal = HoodieAvroUtils.getNestedFieldVal(avroRec, 
field, false,
+                      consistentLogicalTimestampEnabled)
+                    if (fieldVal == null) {
+                      throw new IllegalArgumentException(
+                        s"Precombine/ordering field '$field' has null value 
for record key '${hoodieKey.getRecordKey}'. " +
+                          s"Please ensure all records have non-null values for 
the precombine field, " +
+                          s"or use a payload class that doesn't require 
ordering (e.g., OverwriteWithLatestAvroPayload).")

Review Comment:
   In previous versions of hudi, its common for users to configure precombine 
field even w/ OverwriteWithLatestAvroPayload. But in later versions, we fully 
relaxed the constraint and its totally fine to not have precombine field 
configured or have precombine configured, but some records could have null 
values, assuming the payload is OverwriteWithLatestAvroPayload or merge mode is 
RecordMergeMode.COMMIT_TIME_ORDERING
   
   
   So, if we were to throw exception here, atleast we need to check the payload 
and merge mode and then throw exception accordingly. 
   For eg, incase of OverwriteWithLatestAvroPayload or 
RecordMergeMode.COMMIT_TIME_ORDERING, we don't wanna throw exception
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to