linliu-code commented on code in PR #12384:
URL: https://github.com/apache/hudi/pull/12384#discussion_r1864750837


##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/BaseSparkInternalRowReaderContext.java:
##########
@@ -83,7 +86,26 @@ public Object getValue(InternalRow row, Schema schema, 
String fieldName) {
 
   @Override
   public String getRecordKey(InternalRow row, Schema schema) {
-    return getFieldValueFromInternalRow(row, schema, 
RECORD_KEY_METADATA_FIELD).toString();
+    Object key = getFieldValueFromInternalRow(row, schema, 
RECORD_KEY_METADATA_FIELD);
+    if (key != null) {
+      return key.toString();
+    }
+    return null;
+  }
+
+  @Override
+  public String getRecordKey(InternalRow row, Schema schema, TypedProperties 
props) {
+    String key = getRecordKey(row, schema);

Review Comment:
   In my experiments, the schema may contain metadata fields, but there is no 
such columns in the data. Therefore, these metadata fields are NULL after 
retrieved from the disk. When we tried to get the record key, only the 
`_record_key` is used, which gives us NULL, and cause the merge failure. To 
avoid such problem, we should try to find the key field; there are different 
ways to find out the key field, either from the table config or from the write 
config.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to