rahil-c opened a new issue, #18850: URL: https://github.com/apache/hudi/issues/18850
Part of #18676. RFC-104 / [design PR](https://github.com/chrevanthreddy/hudi/pull/1). ## Scope Schema/registration plumbing only — no data flow yet. Lands the new MDT partition type so subsequent sub-tasks have something to write into. ## Tasks - Add `VECTOR_INDEX` enum constant in `hudi-common/src/main/java/org/apache/hudi/metadata/MetadataPartitionType.java` with: - `partitionPath = "vector_index_"` (multi-instance prefix, mirrors `secondary_index_` and `expr_index_`) - new `recordType` value - override `getPartitionPath(metaClient, indexName)` to suffix the user index name (see `SECONDARY_INDEX` lines ~234–243) - Add `PARTITION_NAME_VECTOR_INDEX_PREFIX = "vector_index_"` in `hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java` near lines 210–212. - Add config knobs in `hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java` (mirror `RECORD_INDEX_*` block, lines 229–370): - `hoodie.metadata.vector.index.enable` (default `false`) - `hoodie.metadata.vector.index.num.clusters` (default `256`) - `hoodie.metadata.vector.index.file.group.count.per.cluster` (default `1` — lets a cluster span N file groups) - `hoodie.metadata.vector.index.training.sample.size` (default `1_000_000` — caps KMeans training rows for large tables) ## Tests - Unit test in `TestMetadataPartitionType` confirming partition path derivation for `vector_index_myidx`. ## Out of scope Payload schema, file-group mapping, KMeans, write path — covered by follow-up sub-issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
