hudi-agent commented on code in PR #18738:
URL: https://github.com/apache/hudi/pull/18738#discussion_r3239985547
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/TimestampBasedAvroKeyGenerator.java:
##########
@@ -73,7 +73,7 @@ public enum TimestampType implements Serializable {
protected final boolean encodePartitionPath;
public TimestampBasedAvroKeyGenerator(TypedProperties config) throws
IOException {
- this(config,
config.getString(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key()),
+ this(config,
config.getString(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key(), null),
Review Comment:
🤖 This constructor is in hudi-client-common and is reachable from Spark /
generic key-gen reflection paths, not just Flink. Previously,
`TypedProperties.getString(key)` calls `checkKey()` and throws a clear error if
`hoodie.datasource.write.recordkey.field` is missing; with the `null` default
it now silently passes `null` down to `SimpleAvroKeyGenerator`, where
`recordKeyFields` becomes an empty list and a later `getRecordKey()` would fail
with `IndexOutOfBoundsException` on `.get(0)`. Could the nullable handling be
scoped to the Flink AutoKeyGen path so Spark / other engines still get the
explicit fast-fail when the record-key option is genuinely missing? @yihua any
concern here for non-Flink callers?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java:
##########
@@ -314,9 +314,6 @@ public void createTable(ObjectPath tablePath,
CatalogBaseTable catalogTable, boo
Configuration conf = Configuration.fromMap(options);
conf.set(FlinkOptions.PATH, tablePathStr);
ResolvedSchema resolvedSchema = resolvedTable.getResolvedSchema();
Review Comment:
🤖 Removing this check means non-append catalog tables created without a
primary key (and without `recordkey.field`) now succeed at create-time and only
fail later at sink/write time via `HoodieTableFactory.checkRecordKey`. Could we
keep an early validation here that only allows the no-PK path when `OPERATION`
is `insert` (append mode), so users still get a clear error at `CREATE TABLE`?
Otherwise an upsert/MoR table can be silently created in the catalog and only
blows up on the first write.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##########
@@ -157,13 +159,31 @@ public static boolean
isDefaultHoodieRecordPayloadClazz(Configuration conf) {
* Return value of {@link FlinkOptions#RECORD_KEY_FIELD} if it was set,
* or throw exception otherwise.
*/
+ @Nullable
public static String getRecordKeyStr(Configuration conf) {
+ return conf.get(FlinkOptions.RECORD_KEY_FIELD);
+ }
+
+ /**
+ * Return the record keys as an array.
+ */
+ public static String[] getRecordKeys(Configuration conf) {
final String recordKeyStr = conf.get(FlinkOptions.RECORD_KEY_FIELD);
- ValidationUtils.checkArgument(
- recordKeyStr != null,
- "Primary key definition is required, use either PRIMARY KEY syntax or
option '"
- + FlinkOptions.RECORD_KEY_FIELD.key() + "' to specify.");
- return recordKeyStr;
+ if (StringUtils.isNullOrEmpty(recordKeyStr)) {
+ return new String[]{};
+ }
+ return recordKeyStr.split(",");
+ }
+
+ /**
+ * Return the bucket index keys as an array.
+ */
+ public static String[] getBucketIndexKeys(Configuration conf) {
+ final String indexKeyStr = conf.get(FlinkOptions.INDEX_KEY_FIELD);
+ if (StringUtils.isNullOrEmpty(indexKeyStr)) {
+ return new String[]{};
Review Comment:
🤖 Minor: `recordKeyStr.split(",")` doesn't trim whitespace, so `"uuid,
name"` becomes `{"uuid", " name"}` (the new test even asserts this). Downstream
checks like `fields.contains(field)` in `HoodieTableFactory.checkRecordKey`
would then fail for the trimmed name even though the user clearly intended
`name`. Pre-existing in `getRecordKeyStr().split(",")` callers, but since this
is being centralized into a helper, would it be worth `.map(String::trim)` here?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##########
@@ -157,13 +159,31 @@ public static boolean
isDefaultHoodieRecordPayloadClazz(Configuration conf) {
* Return value of {@link FlinkOptions#RECORD_KEY_FIELD} if it was set,
* or throw exception otherwise.
*/
+ @Nullable
Review Comment:
🤖 nit: the Javadoc directly above this annotation still says "or throw
exception otherwise", which is now incorrect — could you update it to reflect
that the method returns `null` when the key is unset?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]