rahil-c opened a new issue, #18851: URL: https://github.com/apache/hudi/issues/18851
Part of #18676. RFC-104 / [design PR](https://github.com/chrevanthreddy/hudi/pull/1). ## Scope Wire the on-disk record shape for the initial milestone. Only the minimum payload (cluster ID + raw vector) — RaBitQ codes, scalars, base-table pointers come later. ## Tasks - Add `HoodieVectorIndexInfo` Avro record in `hudi-common/src/main/avro/HoodieMetadata.avsc` after `SecondaryIndexMetadata` (line ~557): - `clusterId` (int) - `vector` (array of float) - `recordKey` is already the MDT record key — no need to duplicate - In `hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java`: - Add `SCHEMA_FIELD_ID_VECTOR_INDEX` field constant - Add a payload constructor `HoodieMetadataPayload(String key, HoodieVectorIndexInfo vectorMetadata)` - Add factory `createVectorIndexRecord(recordKey, clusterId, vector)` — mirror `createSecondaryIndexRecord` and `createRecordIndexUpdate` (line ~650) - Override `combineMetadataPayloads` on the new enum entry (latest-wins for now). ## Tests - Unit test in `TestHoodieMetadataPayload` for round-trip Avro serialization of `HoodieVectorIndexInfo`. ## Depends on - #PARENT (sub-issue 1 — partition type must be registered first) ## Out of scope KMeans training, bootstrap orchestration, DDL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
