rahil-c opened a new issue, #18857:
URL: https://github.com/apache/hudi/issues/18857

   Part of #18676. RFC-104 / [design 
PR](https://github.com/chrevanthreddy/hudi/pull/1).
   
   ## Scope
   
   **Tracking issue only.** These items are explicitly **out of scope** for the 
milestone-1 sub-issues (1–7). They live here so they don't get lost — each will 
be broken out into its own sub-issue once milestone 1 lands.
   
   ## Deferred work
   
   - **RaBitQ quantization**: replace the raw `array<float>` payload with 
packed binary codes + optional norm scalar (see `RaBitQEncoder.java`, 
`VectorQuantizer.java` in the [design 
PR](https://github.com/chrevanthreddy/hudi/pull/1)).
   - **Generation manifest & quantizer record**: `__manifest__` / 
`__centroids__` / `__quantizer__` rows for atomic generation activation.
   - **Read-path pruning**: `VectorIndexPruner`, `VectorIndexMdtSearchUtils`, 
`RaBitQApproxDistanceUDF`, `VectorIndexSupport.scala`.
   - **Write path**: assign incoming records to clusters at write time, write 
tombstones for deletes (RFC-104 write-path doc).
   - **Maintenance**: cluster-imbalance / centroid-drift detection, LIRE-style 
incremental rebalancing, generation rebuild.
   - **Flink and Java engine support** (Spark-first stays as Spark-only in 
milestone 1).
   
   ## Action
   
   Leave this issue open; close once milestone 1 (sub-issues 1–7) is merged and 
follow-up sub-issues are filed for each item above.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to