linliu-code commented on code in PR #12384:
URL: https://github.com/apache/hudi/pull/12384#discussion_r1864750837
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/BaseSparkInternalRowReaderContext.java:
##########
@@ -83,7 +86,26 @@ public Object getValue(InternalRow row, Schema schema,
String fieldName) {
@Override
public String getRecordKey(InternalRow row, Schema schema) {
- return getFieldValueFromInternalRow(row, schema,
RECORD_KEY_METADATA_FIELD).toString();
+ Object key = getFieldValueFromInternalRow(row, schema,
RECORD_KEY_METADATA_FIELD);
+ if (key != null) {
+ return key.toString();
+ }
+ return null;
+ }
+
+ @Override
+ public String getRecordKey(InternalRow row, Schema schema, TypedProperties
props) {
+ String key = getRecordKey(row, schema);
Review Comment:
In my experiments, the schema may contain metadata fields, but there is no
such columns in the data. Therefore, these metadata fields are NULL after
retrieved from the disk. When we tried to get the record key, only the
`_record_key` is used, which gives us NULL, and cause the merge failure. To
avoid such problem, we should try to find the key field; there are different
ways to find out the key field, either from the table config or from the write
config.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]