Thanks to Jingsong for the proposal. Global indexes can be useful in AI scenarios such as vector retrieval, and projects like LanceDB have already adopted this concept. Implementing a global index in the Paimon data lake could further expand its applicability for other scenarios.
I hope Paimon’s index-abstraction interfaces will be designed to support customization by users. I’m looking forward to this feature. Best wishes, Xinyu Liu At 2025-10-23 11:32:02, "Jingsong Li" <[email protected]> wrote: >Hi everyone, > >I'd like to start a new discussion. [1] > >Global Index is a new indexing mechanism provided by Paimon, which is >designed to optimize the performance of field equivalent queries, >range queries, and complex filtering conditions. Compared with >traditional file indexes, global indexes are managed through unified >metadata, It solves the problem of index fragmentation in distributed >scenarios and supports more flexible query modes. The file index is an >index file for each file, while the global index is a table-level >index that manages all data in a unified manner. > >The Index Manifest manages global indexes. We already have two Index >Manifest types, 'DELETION_VECTORS' and 'HASH'. We can first introduce >the 'BITMAP' global Index.The global index maintains the mapping >relationship between the index field and the global row id, so the >global index feature needs to rely on the row-tracking.enabled >feature. > >[1] >https://cwiki.apache.org/confluence/display/PAIMON/PIP-38%3A+Introduce+Global+Index+for+Paimon+Table > >Best, >Jingsong
