ZZZxDong commented on issue #18819: URL: https://github.com/apache/hudi/issues/18819#issuecomment-4777781777
Hi @rahil-c, I'd like to work on this. 👋 I dug into the code: there's currently no type-based validation for record-key fields anywhere — `HoodieOptionConfig.validateTable` (DDL path) only checks the column exists, `SimpleKeyGenerator.validateRecordKey` only checks non-empty/single-field, and `KeyGenUtils.getRecordKey` only checks the runtime value is non-null. So a BLOB column (a struct under the hood) passes all of these and gets stringified into the record key, exactly as described. My plan is to add a fail-fast check that rejects BLOB-typed columns as record keys, covering both the DDL path (`primaryKey` TBLPROPERTY) and the DataSource path (`hoodie.datasource.write.recordkey.field`), with a clear error naming the offending column. A couple of scope questions before I open a PR: 1. Should the same rejection also apply to **partition-path fields** (and `preCombine`/ordering fields), or strictly the record key for now? 2. Is rejecting **all** BLOB record keys the desired behavior, or should EXTERNAL/reference BLOBs be treated differently from INLINE? Happy to take guidance — will send a PR once the scope is confirmed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
