ZZZxDong opened a new pull request, #19056:
URL: https://github.com/apache/hudi/pull/19056

   ### Describe the issue this Pull Request addresses
   
   Closes #18819
   
   A BLOB column (a struct under the hood) silently passed all existing 
key-field
   validation. `CREATE TABLE ... TBLPROPERTIES (primaryKey = '<blob_col>')` and 
the
   equivalent DataSource writes succeeded, producing a JSON-stringified struct 
as the
   `_hoodie_record_key` (e.g. 
`{"type":"INLINE","data":"hello-0","reference":null}`).
   
   BLOB holds large binary payloads — INLINE bytes or EXTERNAL references. 
Using it
   as a key balloons the record key / shuffle / metadata index (INLINE), or ties
   record identity to a storage path rather than content (EXTERNAL). It is not a
   valid record key, ordering/preCombine, or partition path field.
   
   ### Summary and Changelog
   
   Reject BLOB-typed columns used as record key, ordering(preCombine), or 
partition
   path fields, failing fast with a clear message on both write paths:
   
   - DDL: `HoodieOptionConfig.validateTable` (record key + ordering) and
     `HoodieCatalogTable` (partition path).
   - DataSource: `HoodieSparkSqlWriter` `writeInternal` and `bootstrap`.
   
   Adds `HoodieSchemaUtils.isBlobField` / `findBlobFields` helpers (top-level,
   case-insensitive, comma-separated multi-field aware). INLINE and EXTERNAL 
blobs
   are treated identically. Note: `PARTITIONED BY (<blob>)` is already rejected
   earlier by Spark (struct partition columns are disallowed); this change 
covers
   the `hoodie.datasource.write.partitionpath.field` route.
   
   ### Impact
   
   User-facing: a CREATE TABLE / write that previously succeeded with a BLOB key
   field now fails fast with a clear error. This is the intended fix; such 
tables
   were already semantically broken.
   
   ### Risk Level
   
   low — validation-only, no storage format or read-path change. Covered by new
   tests in `TestBlobDataType`, `TestHoodieSparkSqlWriter`, 
`TestHoodieOptionConfig`.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through contributor's guide
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to