rahil-c commented on code in PR #18867: URL: https://github.com/apache/hudi/pull/18867#discussion_r3319039668
########## website/docs/lance_file_format.md: ########## @@ -87,7 +119,45 @@ All Hudi table services work with Lance-backed tables: - **Compaction** — merges log files into Lance base files - **Clustering** — reorganizes Lance files for better data locality - **Cleaning** — removes old Lance file versions -- **Metadata indexing** — column stats and bloom filters work across Lance files +- **Metadata indexing** — bloom filters work across Lance files; column stats and partition stats are + **automatically disabled** for Lance tables + +## VECTOR Storage on Lance + +VECTOR columns are stored natively in Lance as `FixedSizeList<Float32/Float64, dim>` — Lance's own +vector column encoding. This unlocks Lance's built-in IVF-PQ approximate nearest neighbor (ANN) index +for high-throughput vector search without any data conversion overhead. + +Only **FLOAT** and **DOUBLE** element types are supported as VECTOR columns on Lance. INT8 vectors +are not yet supported and will fail fast at write time. + +The `hudi_vector_search` TVF automatically uses the Lance IVF-PQ index when the table uses Lance as Review Comment: @yihua this is not correct, our hudi tvf does not use any of their index. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
