ZZZxDong opened a new pull request, #19056:
URL: https://github.com/apache/hudi/pull/19056
### Describe the issue this Pull Request addresses
Closes #18819
A BLOB column (a struct under the hood) silently passed all existing
key-field
validation. `CREATE TABLE ... TBLPROPERTIES (primaryKey = '<blob_col>')` and
the
equivalent DataSource writes succeeded, producing a JSON-stringified struct
as the
`_hoodie_record_key` (e.g.
`{"type":"INLINE","data":"hello-0","reference":null}`).
BLOB holds large binary payloads — INLINE bytes or EXTERNAL references.
Using it
as a key balloons the record key / shuffle / metadata index (INLINE), or ties
record identity to a storage path rather than content (EXTERNAL). It is not a
valid record key, ordering/preCombine, or partition path field.
### Summary and Changelog
Reject BLOB-typed columns used as record key, ordering(preCombine), or
partition
path fields, failing fast with a clear message on both write paths:
- DDL: `HoodieOptionConfig.validateTable` (record key + ordering) and
`HoodieCatalogTable` (partition path).
- DataSource: `HoodieSparkSqlWriter` `writeInternal` and `bootstrap`.
Adds `HoodieSchemaUtils.isBlobField` / `findBlobFields` helpers (top-level,
case-insensitive, comma-separated multi-field aware). INLINE and EXTERNAL
blobs
are treated identically. Note: `PARTITIONED BY (<blob>)` is already rejected
earlier by Spark (struct partition columns are disallowed); this change
covers
the `hoodie.datasource.write.partitionpath.field` route.
### Impact
User-facing: a CREATE TABLE / write that previously succeeded with a BLOB key
field now fails fast with a clear error. This is the intended fix; such
tables
were already semantically broken.
### Risk Level
low — validation-only, no storage format or read-path change. Covered by new
tests in `TestBlobDataType`, `TestHoodieSparkSqlWriter`,
`TestHoodieOptionConfig`.
### Documentation Update
none
### Contributor's checklist
- [x] Read through contributor's guide
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]