Hi all, Here is the summary of this Monday's index meeting: Two blocking decisions closed:
1. Index is *not* a table— it's its own object (reuses table machinery under the hood). Reasoning: requires a sort order, no column updates, no partition spec, no overlapping ranges between leaves, inherits base-table permissions, and has its own CREATE/DROP/UPDATE INDEX DDL. 2. Index is a separate catalog entity, with no pointers in table metadata— has its own REST endpoints; the catalog can optionally return index metadata withloadTable to avoid extra round-trips. Keeps table and index updates independent/async. Next steps: Start writing the spec and build out the copy-on-write path now. Here are the draft spec: secondary index spec <https://github.com/apache/iceberg/pull/16961> irc spec <https://github.com/apache/iceberg/pull/16963> Thanks, Huaxin On Sat, Jun 20, 2026 at 11:11 AM huaxin gao <[email protected]> wrote: > Hi all, > > I built a standalone PoC to validate that the basic index structure works: > that we can build a PK index, convert equality deletes to position deletes > through it, and have every converted delete land on the correct live row. I > ran it up to *100M keys*. > > *Headline: the structure works.* The index builds over up to 100M keys, > the eq-delete → position-delete conversion resolved correctly at *every* size > (100% of converted deletes mapped to the right live row), and the resulting > position deletes are *~8× cheaper to apply* at query time than the > equality deletes they replace. > > Beyond correctness, the run also shows how the index’s *maintenance* cost > scales, comparing copy-on-write (COW, rewrite touched leaves) vs an > append/merge (MOR) option, under a realistic mixed CDC checkpoint (1,000 > insert + 500 update + 500 delete), local wall-clock: > keys EQ baseline INDEX (COW) % of 60s (COW) INDEX (MOR) % of 60s (MOR) > correct > 5M 6 ms 6.7s 11.2% 2.2s 3.7% PASS > 20M 8 ms 24.2s 40.4% 6.4s 10.6% PASS > 50M 7 ms 51.6s 86.1% 12.2s 20.4% PASS > *100M* 6 ms *75.0s* 125% (BEHIND) *16.9s* 28.2% (keeps up) PASS > > COW maintenance crosses the 60 s checkpoint around 100M (75 s/cycle, > 125%); MOR stays at ~28% and keeps pace; the equality-delete baseline is > ~6 ms and flat. So the structure works, but *COW alone can’t sustain > scattered CDC at hundreds of millions of keys on a single writer*. It’s > worth allowing a merge-on-read / update-file maintenance option alongside > COW (or sharding the index across parallel writers). > > *Full write-up, all tables, and the in-region reality-check:* link > <https://docs.google.com/document/d/1G3zxbW8X0eU3UrouslZfp42bBc9CvgJGnJyDONCB4PU/edit?tab=t.0> > > Feedback welcome, especially on the spec direction (whether to allow a > merge-on-read / update-file maintenance option alongside COW) and on the > read-side modeling. > > Thanks, > Huaxin > > On Tue, Jun 9, 2026 at 5:45 PM huaxin gao <[email protected]> wrote: > >> Sorry, we've skipped posting a few of the dedicated index-sync summaries >> to the mailing list; you can find those in the Google doc >> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.8041k7j2n7y3> >> and the Slack channel. Here's yesterday's summary: >> >> *Decided* >> >> - >> >> Index vs. table (what we agreed): >> - >> >> Reuse table implementation/library code and a near-identical spec >> — the commit path will be custom regardless, so reuse isn't the >> deciding >> factor. >> - >> >> An index is not a table from a user/API view: loading or writing >> an index as a table must fail(it would violate index invariants). >> - >> >> The spec forbids most table behaviors: no overlapping files, one >> mandatory transform sort order, no column updates, no partition spec. >> - >> >> Delete vectors: reuse Iceberg's existing DV — benchmarks showed no >> new delete format is worth introducing. >> - >> >> Incremental updates: start with copy-on-write only (no update files). >> For object-store-sized leaves, a full leaf rewrite is about as cheap as >> maintaining an overlay update file + DV, so we'll skip the MOR machinery >> for now and add it later only if benchmarks prove we need it (likely just >> the very-large-leaf case). >> - >> >> Validate the spec first: build a quick, hand-wired prototype (Parquet >> files structured per the spec) and benchmark it on real scales before >> formalizing. >> >> *Leaning, not final* >> >> - >> >> Indexes are likely separate catalog objects, linked from the table by >> storing just an identifier (like materialized views) and not visible in >> LIST >> TABLES. >> - >> >> We'll need a commit path for indexes, but simpler than tables (no >> stage-create). >> >> *Still open* >> >> >> - >> >> Permissions model — separate vs. inherited (action: look at what real >> DBs do for index permissions). >> - >> >> REST/catalog RPC design — minimize round-trips; index metadata >> ideally returned with LOAD TABLE. Catalog RPC cost may dominate >> Parquet IO, so this needs real design. >> - >> >> Scale modeling — target rows-per-leaf vs. leaf size vs. metadata-file >> count. >> - >> >> DDL-on-index semantics (reuse table schema-update actions or separate) >> >> >> Thanks, >> Huaxin >> >> On Wed, Apr 22, 2026 at 8:47 AM Péter Váry <[email protected]> >> wrote: >> >>> Hi All, >>> >>> TL;DR >>> We still need to validate with ADLS and S3, but based on the local >>> tests, the MPHF approach looks more promising if we can tolerate larger >>> files and longer index maintenance times. >>> >>> Details: >>> Here are the results from the local experiments on my Mac. I removed >>> unnecessary statistics from the Parquet files and tested different row >>> group sizes: >>> >>> - For an index file with 1M records, a row group size of 5,000 >>> appears to be the sweet spot. >>> - For 10M records, 10,000 rows per row group works best. >>> >>> If you have additional ideas for optimizing Parquet-based indexes, I’d >>> be very interested to hear them. >>> The test code is available on this branch: >>> https://github.com/pvary/iceberg/tree/leaf_bench >>> >>> Best results: >>> *1m records/file* >>> >>> - Parquet - 5000 row/RowGroup >>> - Read: 1191 µs - 1 file open, 3 seek, 123KB read per lookup >>> - Write: 1.7 s, 15 MB >>> - MPHF >>> - Read: 202 µs - 1 file open, 1 seek, 282KB read per lookup >>> - Write: 0.8 s, 34 MB >>> >>> *10m records/file* >>> >>> - Parquet - 10000 row/RowGroup >>> - Read: 4168 µs - 1 file open, 3 seek, 395KB read per lookup >>> - Write: 19.5s s, 144 MB >>> - MPHF >>> - Read: 1086 µs - 1 file open, 1 seek, 2.8 MB (2812KB) read per >>> lookup >>> - Write: 6.5 s, 34 MB, 353 MB >>> >>> Below are the full results. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> *Benchmark (indexType) (keyType) >>> (numRows) Mode Cnt Score Error >>> UnitsInvertedIndexBenchmark.lookup PARQUET_1000 LONG >>> 1000000 ss 10000 3285.284 ± 5.138 >>> us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_1000 LONG >>> 1000000 ss 10000 2522168989.000 >>> #InvertedIndexBenchmark.lookup:openStreams PARQUET_1000 LONG >>> 1000000 ss 10000 10000.000 >>> #InvertedIndexBenchmark.lookup:seeks PARQUET_1000 LONG >>> 1000000 ss 10000 30000.000 >>> #InvertedIndexBenchmark.lookup PARQUET_1000 LONG >>> 10000000 ss 10000 35449.614 ± 34.673 >>> us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_1000 LONG >>> 10000000 ss 10000 24302649201.000 >>> #InvertedIndexBenchmark.lookup:openStreams PARQUET_1000 LONG >>> 10000000 ss 10000 10000.000 >>> #InvertedIndexBenchmark.lookup:seeks PARQUET_1000 LONG >>> 10000000 ss 10000 30000.000 >>> #InvertedIndexBenchmark.lookup PARQUET_5000 LONG >>> 1000000 ss 10000 1191.959 ± 4.169 >>> us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_5000 LONG >>> 1000000 ss 10000 1230877229.000 >>> #InvertedIndexBenchmark.lookup:openStreams PARQUET_5000 LONG >>> 1000000 ss 10000 10000.000 >>> #InvertedIndexBenchmark.lookup:seeks PARQUET_5000 LONG >>> 1000000 ss 10000 30000.000 >>> #InvertedIndexBenchmark.lookup PARQUET_5000 LONG >>> 10000000 ss 10000 7236.447 ± 10.374 >>> us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_5000 LONG >>> 10000000 ss 10000 5650715973.000 >>> #InvertedIndexBenchmark.lookup:openStreams PARQUET_5000 LONG >>> 10000000 ss 10000 10000.000 >>> #InvertedIndexBenchmark.lookup:seeks PARQUET_5000 LONG >>> 10000000 ss 10000 30000.000 >>> #InvertedIndexBenchmark.lookup PARQUET_10000 LONG >>> 1000000 ss 10000 1349.946 ± 7.834 >>> us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_10000 LONG >>> 1000000 ss 10000 1730219377.000 >>> #InvertedIndexBenchmark.lookup:openStreams PARQUET_10000 LONG >>> 1000000 ss 10000 10000.000 >>> #InvertedIndexBenchmark.lookup:seeks PARQUET_10000 LONG >>> 1000000 ss 10000 30000.000 >>> #InvertedIndexBenchmark.lookup PARQUET_10000 LONG >>> 10000000 ss 10000 4168.635 ± 11.051 >>> us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_10000 LONG >>> 10000000 ss 10000 3946341532.000 >>> #InvertedIndexBenchmark.lookup:openStreams PARQUET_10000 LONG >>> 10000000 ss 10000 10000.000 >>> #InvertedIndexBenchmark.lookup:seeks PARQUET_10000 LONG >>> 10000000 ss 10000 30000.000 >>> #InvertedIndexBenchmark.lookup PARQUET_50000 LONG >>> 1000000 ss 10000 4736.466 ± 38.179 >>> us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_50000 LONG >>> 1000000 ss 10000 7427413541.000 >>> #InvertedIndexBenchmark.lookup:openStreams PARQUET_50000 LONG >>> 1000000 ss 10000 10000.000 >>> #InvertedIndexBenchmark.lookup:seeks PARQUET_50000 LONG >>> 1000000 ss 10000 30000.000 >>> #InvertedIndexBenchmark.lookup PARQUET_50000 LONG >>> 10000000 ss 10000 4979.031 ± 34.708 >>> us/opInvertedIndexBenchmark.lookup:bytesRead PARQUET_50000 LONG >>> 10000000 ss 10000 7694887636.000 >>> #InvertedIndexBenchmark.lookup:openStreams PARQUET_50000 LONG >>> 10000000 ss 10000 10000.000 >>> #InvertedIndexBenchmark.lookup:seeks PARQUET_50000 LONG >>> 10000000 ss 10000 30000.000 >>> #InvertedIndexBenchmark.lookup MPHF LONG >>> 1000000 ss 10000 202.571 ± 2.336 >>> us/opInvertedIndexBenchmark.lookup:bytesRead MPHF LONG >>> 1000000 ss 10000 2821570000.000 >>> #InvertedIndexBenchmark.lookup:openStreams MPHF LONG >>> 1000000 ss 10000 10000.000 >>> #InvertedIndexBenchmark.lookup:seeks MPHF LONG >>> 1000000 ss 10000 10000.000 >>> #InvertedIndexBenchmark.lookup MPHF LONG >>> 10000000 ss 10000 1086.957 ± 4.524 >>> us/opInvertedIndexBenchmark.lookup:bytesRead MPHF LONG >>> 10000000 ss 10000 28119460000.000 >>> #InvertedIndexBenchmark.lookup:openStreams MPHF LONG >>> 10000000 ss 10000 10000.000 >>> #InvertedIndexBenchmark.lookup:seeks MPHF LONG >>> 10000000 ss 10000 10000.000 >>> #InvertedIndexBenchmark.write PARQUET_1000 LONG >>> 1000000 ss 3 1720731.014 ± 876636.004 >>> us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_1000 LONG >>> 1000000 ss 3 46453317.000 >>> #InvertedIndexBenchmark.write PARQUET_1000 LONG >>> 10000000 ss 3 18547947.876 ± 12258125.307 >>> us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_1000 LONG >>> 10000000 ss 3 452655675.000 >>> #InvertedIndexBenchmark.write PARQUET_5000 LONG >>> 1000000 ss 3 1718345.583 ± 1103928.016 >>> us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_5000 LONG >>> 1000000 ss 3 44845788.000 >>> #InvertedIndexBenchmark.write PARQUET_5000 LONG >>> 10000000 ss 3 18604229.931 ± 2668361.915 >>> us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_5000 LONG >>> 10000000 ss 3 435388818.000 >>> #InvertedIndexBenchmark.write PARQUET_10000 LONG >>> 1000000 ss 3 1761555.389 ± 535857.675 >>> us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_10000 LONG >>> 1000000 ss 3 44536635.000 >>> #InvertedIndexBenchmark.write PARQUET_10000 LONG >>> 10000000 ss 3 19501588.264 ± 2130054.558 >>> us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_10000 LONG >>> 10000000 ss 3 433189623.000 >>> #InvertedIndexBenchmark.write PARQUET_50000 LONG >>> 1000000 ss 3 1936624.889 ± 6601363.985 >>> us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_50000 LONG >>> 1000000 ss 3 44264655.000 >>> #InvertedIndexBenchmark.write PARQUET_50000 LONG >>> 10000000 ss 3 20471742.278 ± 10705206.310 >>> us/opInvertedIndexBenchmark.write:indexFileBytes PARQUET_50000 LONG >>> 10000000 ss 3 431311305.000 >>> #InvertedIndexBenchmark.write MPHF LONG >>> 1000000 ss 3 896573.958 ± 1408024.851 >>> us/opInvertedIndexBenchmark.write:indexFileBytes MPHF LONG >>> 1000000 ss 3 102846369.000 >>> #InvertedIndexBenchmark.write MPHF LONG >>> 10000000 ss 3 6509348.875 ± 15519975.479 >>> us/opInvertedIndexBenchmark.write:indexFileBytes MPHF LONG >>> 10000000 ss 3 1058435733.000 #* >>> >>> huaxin gao <[email protected]> ezt írta (időpont: 2026. ápr. 21., >>> K, 20:53): >>> >>>> Hi all, >>>> >>>> In recent secondary index sync meetings, the discussion converged on >>>> the need to define what an index is from first principles before settling >>>> on physical layout. >>>> >>>> To address that, Peter and I have drafted a requirements document for a >>>> key lookup index (renamed from "primary key index" to avoid implying >>>> uniqueness enforcement), the goal is to nail down one well-scoped index >>>> type first. >>>> >>>> Doc: Key Lookup Index Requirements >>>> <https://docs.google.com/document/d/1e0zxK-jA0LBDq8YQlQgFipTHelDFiga8lCkgDTmYub8/edit?tab=t.0#heading=h.8shrgabvl19> >>>> >>>> It covers requirements, three design options (manifest + sorted >>>> Parquet, hash + sorted Parquet, hash + MPHF) and open questions. We will >>>> add preliminary benchmark results shortly. >>>> >>>> Feedback welcome — inline in the doc, on this thread, or at the next >>>> index sync. >>>> >>>> Thanks, >>>> >>>> Huaxin >>>> >>>> On Mon, Apr 13, 2026 at 7:22 AM Steven Wu <[email protected]> wrote: >>>> >>>>> Do we need the special index identifier that was originally proposed? >>>>> A generic CatalogObjectIdentifier (with namespace and name) would be >>>>> consistent with all object types in the catalog. I have a discussion >>>>> thread >>>>> on the generic identifier topic: [DISCUSS] REST Spec: generic >>>>> CatalogObjectIdentifier. >>>>> >>>>> Should we add an indexes array field to table metadata? It only >>>>> contains a list of index object identifiers. It doesn't contain any index >>>>> metadata which should live in the index objects. Yufei was trying to bring >>>>> this up at the end of the first sync. But we didn't get enough time to >>>>> really discuss it. It will be great to discuss this as the first agenda >>>>> item today. >>>>> >>>>> On Mon, Apr 13, 2026 at 3:17 AM Péter Váry < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> We had several engaging discussions at the Iceberg Summit, and it was >>>>>> great to finally catch up with many of you in person. We truly missed >>>>>> those >>>>>> who couldn’t attend, hopefully we’ll all meet again at the next summit. >>>>>> >>>>>> To keep the conversation going, Huaxin and I have put together the >>>>>> agenda for our next meeting. As a reminder, we’ll meet on *April >>>>>> 13th, 9:00–10:00 AM *PDT (6:00–7:00 PM CEST). >>>>>> >>>>>> Proposed agenda: >>>>>> >>>>>> - Continue first-principles index design discussion from Mar 30 >>>>>> - *Index Ownership and Write Responsibility* >>>>>> - Should writers be allowed to update indexes, or >>>>>> - Should all index writes be handled exclusively by the >>>>>> Index Maintenance process? >>>>>> - If writers can update indexes then we need to define what >>>>>> guarantees are required (compaction, file splitting, layout >>>>>> expectations)? >>>>>> - If only Index Maintenance updates indexes then we only >>>>>> need to define what observable properties should be exposed to >>>>>> consumers? >>>>>> Like: >>>>>> - Expected max files for a single key >>>>>> - Current max files for a single key >>>>>> - Deletes allowed/present >>>>>> - Sorted by >>>>>> - Partitioned by >>>>>> - *Specification Scope: What Belongs in the Spec?* >>>>>> - Related to the ownership question above >>>>>> - Light spec: Just define that the index table should be >>>>>> optimized for retrieval by key columns and the index columns >>>>>> should be >>>>>> contained in the table. This could give us more flexibility if >>>>>> better >>>>>> organization methods come up, or >>>>>> - Detailed spec: We could define the max number of files >>>>>> per index to read for a single key, or even the partitioning >>>>>> and the exact >>>>>> sort order. This could allow more use-cases for a given index, >>>>>> like joins >>>>>> or cardinality estimations. >>>>>> - I would go for light spec for the main types (PK, >>>>>> Containing) and only the Index Maintenance processes should >>>>>> update the >>>>>> Indexes, as for many use-cases the details are not important, >>>>>> and writers >>>>>> will very rarely update the Indexes themselves. >>>>>> - *Logical Placement of Indexes* >>>>>> - Index as a child object of an Iceberg Table, or >>>>>> - Index as a first‑class entity under >>>>>> /namespace/indexes/{index} >>>>>> - Based on the discussions on the summit we are leaning in >>>>>> this direction. This means the index id should be unique in the >>>>>> namespace >>>>>> but helps the catalog implementations quite a bit >>>>>> - *Physical Placement of Index Data* >>>>>> - I don’t think we should specify this. We should have a >>>>>> base location for the index, but can rely on the catalog >>>>>> implementations to >>>>>> decide on their own, like they do with the tables, views, udfs. >>>>>> - *Iceberg Reader Based indexes* (Containing indexes and >>>>>> potentially PK indexes). These are the indexes which could be read >>>>>> by the >>>>>> existing Iceberg readers. We might decide to store the PK index >>>>>> similarly >>>>>> to an Iceberg Table and treat it as a reader based index. >>>>>> - What are the table properties/features exposed to the >>>>>> readers >>>>>> - Maybe just some behavioral descriptors for the >>>>>> optimizer to decide if the index could be used or should be >>>>>> skipped, like: >>>>>> - Expected max files for a single key >>>>>> - max files for a single key >>>>>> - Deletes allowed/present >>>>>> - Sorted by >>>>>> - Partitioned by >>>>>> - The Tasks when reading the index based on the filters >>>>>> and projection >>>>>> - What are the table properties/features exposed to the >>>>>> Index Maintenance. I think this could be internal to the Index >>>>>> Maintenance >>>>>> process and might not be exposed through the spec. The Index >>>>>> Maintenance >>>>>> process could handle this as a standard Iceberg Table and could >>>>>> be based on >>>>>> the Table Maintenance process, but there might be some totally >>>>>> different >>>>>> processes. >>>>>> - It should be possible to add properties to an index defined >>>>>> by the Index Maintenance process which could be used and updated >>>>>> in the >>>>>> next Index Maintenance run. >>>>>> - *PK index storage format benchmark results* >>>>>> - Flat Parquet (baseline) >>>>>> - BTree with Parquet leaves >>>>>> - Vortex >>>>>> - *Open items / next steps* >>>>>> >>>>>> Thanks, >>>>>> Peter >>>>>> >>>>>> huaxin gao <[email protected]> ezt írta (időpont: 2026. márc. >>>>>> 23., H, 3:03): >>>>>> >>>>>>> Hi everyone, I wanted to share an update on the primary key index >>>>>>> work. >>>>>>> Since there are still open questions on whether bloom filter indexes >>>>>>> fit in the secondary index framework or should be treated as extended >>>>>>> stats, I've shifted focus to the primary key index since it's a clearer >>>>>>> fit >>>>>>> for the framework. >>>>>>> I've put together a proposal for a primary key reverse-lookup index >>>>>>> that maps each key to its physical location (file_path, row_position). >>>>>>> It >>>>>>> enables: >>>>>>> >>>>>>> - Scan-time file pruning for point lookups >>>>>>> - Converting key-based deletes into position deletes >>>>>>> (eliminating equality deletes for Flink CDC) >>>>>>> - Accelerating Spark MERGE INTO by replacing full-table joins >>>>>>> with direct file lookups >>>>>>> >>>>>>> Proposal: >>>>>>> https://docs.google.com/document/d/1HuhCZ0n2FqDh8yqQb9oEj1CPM5yXpEsMPGZno2aSf8E/edit?tab=t.0#heading=h.tbevg4q0m9 >>>>>>> Feedback welcome! >>>>>>> Thanks, >>>>>>> Huaxin >>>>>>> >>>>>>> On Wed, Mar 18, 2026 at 11:42 PM Péter Váry < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Key takeaways from the general index discussion at the May 16 >>>>>>>> meeting. >>>>>>>> Thanks to everyone who participated! The recording is available >>>>>>>> here: https://www.youtube.com/watch?v=btmjhtRWUCE >>>>>>>> >>>>>>>> - Q: Do we need to tie index types to the algorithms used to >>>>>>>> access them? >>>>>>>> - A: From a specification perspective, the goal is to define >>>>>>>> the storage-level data layout so it can be shared across engines. >>>>>>>> Engines >>>>>>>> are free to interpret and use the data as they see fit, but the >>>>>>>> on-disk >>>>>>>> data layout itself must be strictly defined and interoperable. >>>>>>>> >>>>>>>> - Q: Should we introduce an additional abstraction layer (e.g., >>>>>>>> Vector Index) with sub-types such as IVF and DiskANN? >>>>>>>> - A: This is possible if we decide it is beneficial. I explored >>>>>>>> potential naming, but it is not yet clear how such a layer would be >>>>>>>> used in >>>>>>>> practice. >>>>>>>> *Question to Yingyi Bu*: could you provide examples where this >>>>>>>> additional layer would be useful? Should this abstraction be >>>>>>>> defined at the >>>>>>>> spec level, or is it better handled at the engine level? >>>>>>>> My initial idea was that users would create a generic Vector >>>>>>>> Index and let the engine choose the concrete implementation. >>>>>>>> However, this >>>>>>>> would limit user control and users likely need to specify the exact >>>>>>>> index >>>>>>>> representation, which implies they must be aware of the available >>>>>>>> representations. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> - Q: Do we want to allow extensibility for index types? >>>>>>>> - A: Yes. The intent is to support a small set of well-defined >>>>>>>> index types while allowing experimentation with new ones. If a new >>>>>>>> index >>>>>>>> type proves broadly useful, a follow-up proposal can standardize it >>>>>>>> and >>>>>>>> incorporate it into the spec. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> - Q: Do we allow multiple versions of an index for the same >>>>>>>> table snapshot? >>>>>>>> - A: Yes. Older index versions must be retained for readers >>>>>>>> that have already started using them, while new readers should >>>>>>>> automatically use the latest available version >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> - Q: Do we need to use materialized views for these indexes? >>>>>>>> - A: No. These indexes are primarily examples, and different >>>>>>>> types may require different storage methods. However, the Primary >>>>>>>> Key, >>>>>>>> Containing, and parts of the IVF indexes can be structured as >>>>>>>> Iceberg >>>>>>>> tables. This allows engines to read them natively; in some cases, >>>>>>>> Iceberg >>>>>>>> planners can automatically redirect queries to the index table >>>>>>>> without >>>>>>>> engine modifications. Furthermore, index maintenance for these >>>>>>>> tables can >>>>>>>> leverage existing materialized view maintenance workflows. Other >>>>>>>> index >>>>>>>> types may instead rely on Puffin files or alternative storage >>>>>>>> approaches. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> - Q: How should index metadata be accessed? Should we add >>>>>>>> explicit pointers for the indexes in the table metadata? >>>>>>>> - A: We did not have sufficient time to fully explore and >>>>>>>> conclude this topic. >>>>>>>> *Question for Yufei Gu*: Did I understand correctly that your >>>>>>>> main concern stems from endpoint resolution from a REST Catalog >>>>>>>> perspective? Specifically, if indexes are exposed under a URI such >>>>>>>> as >>>>>>>> v1/{prefix}/namespaces/{namespace}/tables/{table}/indexes/{index}, >>>>>>>> would >>>>>>>> this make it more difficult for the REST Catalog to resolve and >>>>>>>> route >>>>>>>> requests to the appropriate endpoint? >>>>>>>> >>>>>>>> >>>>>>>> Suhas Jayaram Subramanya via dev <[email protected]> ezt írta >>>>>>>> (időpont: 2026. márc. 13., P, 23:32): >>>>>>>> >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> Here's a proposal for native Vector Index support in Iceberg >>>>>>>>> tables -- >>>>>>>>> https://docs.google.com/document/d/1KL4qLOwdqnhOcqTc0EjO1O16NV3M3c-gZCEINDWw4lA/edit?usp=sharing >>>>>>>>> >>>>>>>>> We've been working on this proposal with Peter internally at >>>>>>>>> Microsoft and he suggested we post it here to bring this to the >>>>>>>>> community's >>>>>>>>> attention, ahead of the next Secondary Index Sync. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Suhas >>>>>>>>> >>>>>>>>> On 2026/02/19 04:34:34 huaxin gao wrote: >>>>>>>>> > Hi Everyone, >>>>>>>>> > >>>>>>>>> > Here are the recording and notes from the Iceberg Index Support >>>>>>>>> Sync on >>>>>>>>> > 2/11. >>>>>>>>> > >>>>>>>>> > Recording: https://www.youtube.com/watch?v=3sFfQ0A50yk >>>>>>>>> > >>>>>>>>> > Notes: >>>>>>>>> > >>>>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.8041k7j2n7y3 >>>>>>>>> > >>>>>>>>> > The meeting will move to biweekly, Mondays 9–10am PST, starting >>>>>>>>> March 2. >>>>>>>>> > >>>>>>>>> > Since the sync, I updated the Bloom skipping index proposal >>>>>>>>> > < >>>>>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.5r5kl6k3fqwu >>>>>>>>> > >>>>>>>>> > to address the discussion questions, specifically: >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > - Performance justification: when this helps (high-cardinality = >>>>>>>>> / IN, >>>>>>>>> > many data files, high object-store latency) and how it differs >>>>>>>>> from Parquet >>>>>>>>> > row-group Bloom filters (which still require opening the data >>>>>>>>> file). >>>>>>>>> > - Cost / scalability: rough sizing (Bloom blob size per file, >>>>>>>>> Puffin >>>>>>>>> > file size), the planning cost trade-off (driver index reads vs >>>>>>>>> executor >>>>>>>>> > file opens), and mitigations via caching. >>>>>>>>> > - Lifecycle / maintenance: incremental production as new data >>>>>>>>> files >>>>>>>>> > arrive, behavior when the index is missing/behind, and >>>>>>>>> sharding/compaction >>>>>>>>> > plus cleanup to avoid accumulating too many small Puffin files >>>>>>>>> over time. >>>>>>>>> > - Writer expectations: inline (optional) vs asynchronous >>>>>>>>> (primary) index >>>>>>>>> > creation. >>>>>>>>> > >>>>>>>>> > I also implemented a Spark 4.1 POC >>>>>>>>> > <https://github.com/apache/iceberg/pull/15311> and a local >>>>>>>>> benchmark to >>>>>>>>> > quantify both the pruning impact (plannedFiles → afterBloom) and >>>>>>>>> the index >>>>>>>>> > read overhead (statsFiles, statsBytes, bloomPayloadBytes) for >>>>>>>>> point >>>>>>>>> > predicates on high-cardinality columns. Please take a look and >>>>>>>>> let me know >>>>>>>>> > if you have any questions or feedback. >>>>>>>>> > >>>>>>>>> > Thanks, >>>>>>>>> > >>>>>>>>> > Huaxin >>>>>>>>> > >>>>>>>>> > On Tue, Feb 10, 2026 at 1:43 PM huaxin gao <[email protected]> >>>>>>>>> wrote: >>>>>>>>> > >>>>>>>>> > > Reminder for tomorrow's sync on Iceberg Index Support. >>>>>>>>> > > >>>>>>>>> > > Wednesday: Feb. 11 9:00 – 10:00am >>>>>>>>> > > Time zone: America/Los_Angeles >>>>>>>>> > > Google Meet joining info >>>>>>>>> > > Video call link: meet.google.com/nsp-ctyr-khk >>>>>>>>> > > Design doc: >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2 >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7 >>>>>>>>> > > >>>>>>>>> > > Thanks, >>>>>>>>> > > Huaxin >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > On Tue, Feb 3, 2026 at 10:52 PM Péter Váry <[email protected]> >>>>>>>>> > > wrote: >>>>>>>>> > > >>>>>>>>> > >> Thanks Huaxin and Steven for organizing this. Looking forward >>>>>>>>> to meet you >>>>>>>>> > >> all next week! >>>>>>>>> > >> >>>>>>>>> > >> On Wed, Feb 4, 2026, 02:48 Steven Wu <[email protected]> wrote: >>>>>>>>> > >> >>>>>>>>> > >>> We set up the dev calendar event with a new google meet >>>>>>>>> link. Please >>>>>>>>> > >>> ignore the link from Huaxin's original email. >>>>>>>>> > >>> >>>>>>>>> > >>> The dev calendar has the correct info (including the new >>>>>>>>> meeting link) >>>>>>>>> > >>> >>>>>>>>> > >>> Iceberg Index Support Sync >>>>>>>>> > >>> Wednesday, February 11 · 9:00 – 10:00am >>>>>>>>> > >>> Time zone: America/Los_Angeles >>>>>>>>> > >>> Google Meet joining info >>>>>>>>> > >>> Video call link: https://meet.google.com/nsp-ctyr-khk >>>>>>>>> > >>> >>>>>>>>> > >>> On Tue, Feb 3, 2026 at 5:08 PM huaxin gao <[email protected]> >>>>>>>>> > >>> wrote: >>>>>>>>> > >>> >>>>>>>>> > >>>> Sorry, I meant PST (not EST) :) >>>>>>>>> > >>>> Looking forward to the discussion! >>>>>>>>> > >>>> >>>>>>>>> > >>>> On Tue, Feb 3, 2026 at 4:58 PM Shawn Chang <[email protected] >>>>>>>>> > >>>>>>>>> > >>>> wrote: >>>>>>>>> > >>>> >>>>>>>>> > >>>>> Hi Huaxin, >>>>>>>>> > >>>>> >>>>>>>>> > >>>>> Thanks for starting the sync! >>>>>>>>> > >>>>> >>>>>>>>> > >>>>> The meeting seems to be 9-10AM PST on the dev events >>>>>>>>> calendar >>>>>>>>> > >>>>> < >>>>>>>>> https://calendar.google.com/calendar/u/0?cid=MzkwNWQ0OTJmMWI0NTBiYTA3MTJmMmFlNmFmYTc2ZWI3NTdmMTNkODUyMjBjYzAzYWE0NTI3ODg1YWRjNTYyOUBncm91cC5jYWxlbmRhci5nb29nbGUuY29t >>>>>>>>> >, >>>>>>>>> > >>>>> not EST. Maybe it's a typo? >>>>>>>>> > >>>>> Otherwise, looking forward to the discussion! >>>>>>>>> > >>>>> >>>>>>>>> > >>>>> Best, >>>>>>>>> > >>>>> Shawn >>>>>>>>> > >>>>> >>>>>>>>> > >>>>> On Tue, Feb 3, 2026 at 9:18 AM huaxin gao <[email protected] >>>>>>>>> > >>>>>>>>> > >>>>> wrote: >>>>>>>>> > >>>>> >>>>>>>>> > >>>>>> Hi all, >>>>>>>>> > >>>>>> I'd like to start a dedicated sync to discuss Iceberg >>>>>>>>> Index support. >>>>>>>>> > >>>>>> Here is the existing discussion thread: >>>>>>>>> > >>>>>> >>>>>>>>> https://lists.apache.org/thread/fzqk3jjf0xpj5m4cfqb3v4c65p0t04ty. >>>>>>>>> > >>>>>> >>>>>>>>> > >>>>>> To ground the discussion, here are the two proposals: >>>>>>>>> > >>>>>> >>>>>>>>> > >>>>>> - Peter's proposal >>>>>>>>> > >>>>>> < >>>>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2> >>>>>>>>> (overall >>>>>>>>> > >>>>>> index support) >>>>>>>>> > >>>>>> - My proposal >>>>>>>>> > >>>>>> < >>>>>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7 >>>>>>>>> > >>>>>>>>> > >>>>>> (bloom filter skipping index) >>>>>>>>> > >>>>>> >>>>>>>>> > >>>>>> Time slot: Every 3 weeks, Wednesdays at 9 AM to 10 AM >>>>>>>>> EST, starting >>>>>>>>> > >>>>>> next Wednesday (2/11). After FileFormat sync finishes, we >>>>>>>>> plan to use that >>>>>>>>> > >>>>>> slot and switch to every other Monday, 9 AM to 10 AM EST. >>>>>>>>> > >>>>>> >>>>>>>>> > >>>>>> Meet link: https://meet.google.com/fjn-tyze-mko >>>>>>>>> > >>>>>> >>>>>>>>> > >>>>>> Thanks, >>>>>>>>> > >>>>>> Huaxin >>>>>>>>> > >>>>>> >>>>>>>>> > >>>>> >>>>>>>>> > >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>
