ZZZxDong commented on issue #18819:
URL: https://github.com/apache/hudi/issues/18819#issuecomment-4777781777

   Hi @rahil-c, I'd like to work on this. 👋
   
   I dug into the code: there's currently no type-based validation for 
record-key fields anywhere — `HoodieOptionConfig.validateTable` (DDL path) only 
checks the column exists, `SimpleKeyGenerator.validateRecordKey` only checks 
non-empty/single-field, and `KeyGenUtils.getRecordKey` only checks the runtime 
value is non-null. So a BLOB column (a struct under the hood) passes all of 
these and gets stringified into the record key, exactly as described.
   
   My plan is to add a fail-fast check that rejects BLOB-typed columns as 
record keys, covering both the DDL path (`primaryKey` TBLPROPERTY) and the 
DataSource path (`hoodie.datasource.write.recordkey.field`), with a clear error 
naming the offending column.
   
   A couple of scope questions before I open a PR:
   1. Should the same rejection also apply to **partition-path fields** (and 
`preCombine`/ordering fields), or strictly the record key for now?
   2. Is rejecting **all** BLOB record keys the desired behavior, or should 
EXTERNAL/reference BLOBs be treated differently from INLINE?
   
   Happy to take guidance — will send a PR once the scope is confirmed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to