rahil-c opened a new issue, #18851:
URL: https://github.com/apache/hudi/issues/18851

   Part of #18676. RFC-104 / [design 
PR](https://github.com/chrevanthreddy/hudi/pull/1).
   
   ## Scope
   
   Wire the on-disk record shape for the initial milestone. Only the minimum 
payload (cluster ID + raw vector) — RaBitQ codes, scalars, base-table pointers 
come later.
   
   ## Tasks
   
   - Add `HoodieVectorIndexInfo` Avro record in 
`hudi-common/src/main/avro/HoodieMetadata.avsc` after `SecondaryIndexMetadata` 
(line ~557):
     - `clusterId` (int)
     - `vector` (array of float)
     - `recordKey` is already the MDT record key — no need to duplicate
   - In 
`hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java`:
     - Add `SCHEMA_FIELD_ID_VECTOR_INDEX` field constant
     - Add a payload constructor `HoodieMetadataPayload(String key, 
HoodieVectorIndexInfo vectorMetadata)`
     - Add factory `createVectorIndexRecord(recordKey, clusterId, vector)` — 
mirror `createSecondaryIndexRecord` and `createRecordIndexUpdate` (line ~650)
   - Override `combineMetadataPayloads` on the new enum entry (latest-wins for 
now).
   
   ## Tests
   
   - Unit test in `TestHoodieMetadataPayload` for round-trip Avro serialization 
of `HoodieVectorIndexInfo`.
   
   ## Depends on
   
   - #PARENT (sub-issue 1 — partition type must be registered first)
   
   ## Out of scope
   
   KMeans training, bootstrap orchestration, DDL.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to