hudi-agent commented on code in PR #18738:
URL: https://github.com/apache/hudi/pull/18738#discussion_r3239985547


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/TimestampBasedAvroKeyGenerator.java:
##########
@@ -73,7 +73,7 @@ public enum TimestampType implements Serializable {
   protected final boolean encodePartitionPath;
 
   public TimestampBasedAvroKeyGenerator(TypedProperties config) throws 
IOException {
-    this(config, 
config.getString(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key()),
+    this(config, 
config.getString(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key(), null),

Review Comment:
   🤖 This constructor is in hudi-client-common and is reachable from Spark / 
generic key-gen reflection paths, not just Flink. Previously, 
`TypedProperties.getString(key)` calls `checkKey()` and throws a clear error if 
`hoodie.datasource.write.recordkey.field` is missing; with the `null` default 
it now silently passes `null` down to `SimpleAvroKeyGenerator`, where 
`recordKeyFields` becomes an empty list and a later `getRecordKey()` would fail 
with `IndexOutOfBoundsException` on `.get(0)`. Could the nullable handling be 
scoped to the Flink AutoKeyGen path so Spark / other engines still get the 
explicit fast-fail when the record-key option is genuinely missing? @yihua any 
concern here for non-Flink callers?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java:
##########
@@ -314,9 +314,6 @@ public void createTable(ObjectPath tablePath, 
CatalogBaseTable catalogTable, boo
     Configuration conf = Configuration.fromMap(options);
     conf.set(FlinkOptions.PATH, tablePathStr);
     ResolvedSchema resolvedSchema = resolvedTable.getResolvedSchema();

Review Comment:
   🤖 Removing this check means non-append catalog tables created without a 
primary key (and without `recordkey.field`) now succeed at create-time and only 
fail later at sink/write time via `HoodieTableFactory.checkRecordKey`. Could we 
keep an early validation here that only allows the no-PK path when `OPERATION` 
is `insert` (append mode), so users still get a clear error at `CREATE TABLE`? 
Otherwise an upsert/MoR table can be silently created in the catalog and only 
blows up on the first write.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##########
@@ -157,13 +159,31 @@ public static boolean 
isDefaultHoodieRecordPayloadClazz(Configuration conf) {
    * Return value of {@link FlinkOptions#RECORD_KEY_FIELD} if it was set,
    * or throw exception otherwise.
    */
+  @Nullable
   public static String getRecordKeyStr(Configuration conf) {
+    return conf.get(FlinkOptions.RECORD_KEY_FIELD);
+  }
+
+  /**
+   * Return the record keys as an array.
+   */
+  public static String[] getRecordKeys(Configuration conf) {
     final String recordKeyStr = conf.get(FlinkOptions.RECORD_KEY_FIELD);
-    ValidationUtils.checkArgument(
-        recordKeyStr != null,
-        "Primary key definition is required, use either PRIMARY KEY syntax or 
option '"
-            + FlinkOptions.RECORD_KEY_FIELD.key() + "' to specify.");
-    return recordKeyStr;
+    if (StringUtils.isNullOrEmpty(recordKeyStr)) {
+      return new String[]{};
+    }
+    return recordKeyStr.split(",");
+  }
+
+  /**
+   * Return the bucket index keys as an array.
+   */
+  public static String[] getBucketIndexKeys(Configuration conf) {
+    final String indexKeyStr = conf.get(FlinkOptions.INDEX_KEY_FIELD);
+    if (StringUtils.isNullOrEmpty(indexKeyStr)) {
+      return new String[]{};

Review Comment:
   🤖 Minor: `recordKeyStr.split(",")` doesn't trim whitespace, so `"uuid, 
name"` becomes `{"uuid", " name"}` (the new test even asserts this). Downstream 
checks like `fields.contains(field)` in `HoodieTableFactory.checkRecordKey` 
would then fail for the trimmed name even though the user clearly intended 
`name`. Pre-existing in `getRecordKeyStr().split(",")` callers, but since this 
is being centralized into a helper, would it be worth `.map(String::trim)` here?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##########
@@ -157,13 +159,31 @@ public static boolean 
isDefaultHoodieRecordPayloadClazz(Configuration conf) {
    * Return value of {@link FlinkOptions#RECORD_KEY_FIELD} if it was set,
    * or throw exception otherwise.
    */
+  @Nullable

Review Comment:
   🤖 nit: the Javadoc directly above this annotation still says "or throw 
exception otherwise", which is now incorrect — could you update it to reflect 
that the method returns `null` when the key is unset?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to