rahil-c opened a new issue, #18857: URL: https://github.com/apache/hudi/issues/18857
Part of #18676. RFC-104 / [design PR](https://github.com/chrevanthreddy/hudi/pull/1). ## Scope **Tracking issue only.** These items are explicitly **out of scope** for the milestone-1 sub-issues (1–7). They live here so they don't get lost — each will be broken out into its own sub-issue once milestone 1 lands. ## Deferred work - **RaBitQ quantization**: replace the raw `array<float>` payload with packed binary codes + optional norm scalar (see `RaBitQEncoder.java`, `VectorQuantizer.java` in the [design PR](https://github.com/chrevanthreddy/hudi/pull/1)). - **Generation manifest & quantizer record**: `__manifest__` / `__centroids__` / `__quantizer__` rows for atomic generation activation. - **Read-path pruning**: `VectorIndexPruner`, `VectorIndexMdtSearchUtils`, `RaBitQApproxDistanceUDF`, `VectorIndexSupport.scala`. - **Write path**: assign incoming records to clusters at write time, write tombstones for deletes (RFC-104 write-path doc). - **Maintenance**: cluster-imbalance / centroid-drift detection, LIRE-style incremental rebalancing, generation rebuild. - **Flink and Java engine support** (Spark-first stays as Spark-only in milestone 1). ## Action Leave this issue open; close once milestone 1 (sub-issues 1–7) is merged and follow-up sub-issues are filed for each item above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
