Re: Re: Dedicated sync for Iceberg Index Support

huaxin gao Tue, 09 Jun 2026 17:45:31 -0700

Sorry,  we've skipped posting a few of the dedicated index-sync summaries
to the mailing list; you can find those in the Google doc
<https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.8041k7j2n7y3>
and the Slack channel. Here's yesterday's summary:


*Decided*

   -

   Index vs. table (what we agreed):
   -

      Reuse table implementation/library code and a near-identical spec —
      the commit path will be custom regardless, so reuse isn't the deciding
      factor.
      -

      An index is not a table from a user/API view: loading or writing an
      index as a table must fail(it would violate index invariants).
      -

      The spec forbids most table behaviors: no overlapping files, one
      mandatory transform sort order, no column updates, no partition spec.
      -

   Delete vectors: reuse Iceberg's existing DV — benchmarks showed no new
   delete format is worth introducing.
   -

   Incremental updates: start with copy-on-write only (no update files).
   For object-store-sized leaves, a full leaf rewrite is about as cheap as
   maintaining an overlay update file + DV, so we'll skip the MOR machinery
   for now and add it later only if benchmarks prove we need it (likely just
   the very-large-leaf case).
   -

   Validate the spec first: build a quick, hand-wired prototype (Parquet
   files structured per the spec) and benchmark it on real scales before
   formalizing.

*Leaning, not final*

   -

   Indexes are likely separate catalog objects, linked from the table by
   storing just an identifier (like materialized views) and not visible in LIST
   TABLES.
   -

   We'll need a commit path for indexes, but simpler than tables (no
   stage-create).

*Still open*


   -

   Permissions model — separate vs. inherited (action: look at what real
   DBs do for index permissions).
   -

   REST/catalog RPC design — minimize round-trips; index metadata ideally
   returned with LOAD TABLE. Catalog RPC cost may dominate Parquet IO, so
   this needs real design.
   -

   Scale modeling — target rows-per-leaf vs. leaf size vs. metadata-file
   count.
   -

   DDL-on-index semantics (reuse table schema-update actions or separate)


Thanks,
Huaxin

On Wed, Apr 22, 2026 at 8:47 AM Péter Váry <[email protected]>
wrote:

> Hi All,
>
> TL;DR
> We still need to validate with ADLS and S3, but based on the local tests,
> the MPHF approach looks more promising if we can tolerate larger files and
> longer index maintenance times.
>
> Details:
> Here are the results from the local experiments on my Mac. I removed
> unnecessary statistics from the Parquet files and tested different row
> group sizes:
>
>    - For an index file with 1M records, a row group size of 5,000 appears
>    to be the sweet spot.
>    - For 10M records, 10,000 rows per row group works best.
>
> If you have additional ideas for optimizing Parquet-based indexes, I’d be
> very interested to hear them.
> The test code is available on this branch:
> https://github.com/pvary/iceberg/tree/leaf_bench
>
> Best results:
> *1m records/file*
>
>    - Parquet - 5000 row/RowGroup
>       - Read: 1191 µs - 1 file open, 3 seek, 123KB read per lookup
>       - Write: 1.7 s, 15 MB
>    - MPHF
>       - Read: 202 µs - 1 file open, 1 seek,  282KB read per lookup
>       - Write: 0.8 s, 34 MB
>
> *10m records/file*
>
>    - Parquet - 10000 row/RowGroup
>       - Read: 4168 µs - 1 file open, 3 seek, 395KB read per lookup
>       - Write: 19.5s s, 144 MB
>    - MPHF
>       - Read: 1086 µs - 1 file open, 1 seek,  2.8 MB (2812KB) read per
>       lookup
>       - Write: 6.5 s, 34 MB, 353 MB
>
> Below are the full results.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *Benchmark                                      (indexType)  (keyType)
>  (numRows)  Mode    Cnt            Score          Error
>  UnitsInvertedIndexBenchmark.lookup                 PARQUET_1000       LONG
>    1000000    ss  10000         3285.284 ±        5.138
>  us/opInvertedIndexBenchmark.lookup:bytesRead       PARQUET_1000       LONG
>    1000000    ss  10000   2522168989.000
> #InvertedIndexBenchmark.lookup:openStreams     PARQUET_1000       LONG
>  1000000    ss  10000        10000.000
> #InvertedIndexBenchmark.lookup:seeks           PARQUET_1000       LONG
>  1000000    ss  10000        30000.000
> #InvertedIndexBenchmark.lookup                 PARQUET_1000       LONG
> 10000000    ss  10000        35449.614 ±       34.673
>  us/opInvertedIndexBenchmark.lookup:bytesRead       PARQUET_1000       LONG
>   10000000    ss  10000  24302649201.000
> #InvertedIndexBenchmark.lookup:openStreams     PARQUET_1000       LONG
> 10000000    ss  10000        10000.000
> #InvertedIndexBenchmark.lookup:seeks           PARQUET_1000       LONG
> 10000000    ss  10000        30000.000
> #InvertedIndexBenchmark.lookup                 PARQUET_5000       LONG
>  1000000    ss  10000         1191.959 ±        4.169
>  us/opInvertedIndexBenchmark.lookup:bytesRead       PARQUET_5000       LONG
>    1000000    ss  10000   1230877229.000
> #InvertedIndexBenchmark.lookup:openStreams     PARQUET_5000       LONG
>  1000000    ss  10000        10000.000
> #InvertedIndexBenchmark.lookup:seeks           PARQUET_5000       LONG
>  1000000    ss  10000        30000.000
> #InvertedIndexBenchmark.lookup                 PARQUET_5000       LONG
> 10000000    ss  10000         7236.447 ±       10.374
>  us/opInvertedIndexBenchmark.lookup:bytesRead       PARQUET_5000       LONG
>   10000000    ss  10000   5650715973.000
> #InvertedIndexBenchmark.lookup:openStreams     PARQUET_5000       LONG
> 10000000    ss  10000        10000.000
> #InvertedIndexBenchmark.lookup:seeks           PARQUET_5000       LONG
> 10000000    ss  10000        30000.000
> #InvertedIndexBenchmark.lookup                PARQUET_10000       LONG
>  1000000    ss  10000         1349.946 ±        7.834
>  us/opInvertedIndexBenchmark.lookup:bytesRead      PARQUET_10000       LONG
>    1000000    ss  10000   1730219377.000
> #InvertedIndexBenchmark.lookup:openStreams    PARQUET_10000       LONG
>  1000000    ss  10000        10000.000
> #InvertedIndexBenchmark.lookup:seeks          PARQUET_10000       LONG
>  1000000    ss  10000        30000.000
> #InvertedIndexBenchmark.lookup                PARQUET_10000       LONG
> 10000000    ss  10000         4168.635 ±       11.051
>  us/opInvertedIndexBenchmark.lookup:bytesRead      PARQUET_10000       LONG
>   10000000    ss  10000   3946341532.000
> #InvertedIndexBenchmark.lookup:openStreams    PARQUET_10000       LONG
> 10000000    ss  10000        10000.000
> #InvertedIndexBenchmark.lookup:seeks          PARQUET_10000       LONG
> 10000000    ss  10000        30000.000
> #InvertedIndexBenchmark.lookup                PARQUET_50000       LONG
>  1000000    ss  10000         4736.466 ±       38.179
>  us/opInvertedIndexBenchmark.lookup:bytesRead      PARQUET_50000       LONG
>    1000000    ss  10000   7427413541.000
> #InvertedIndexBenchmark.lookup:openStreams    PARQUET_50000       LONG
>  1000000    ss  10000        10000.000
> #InvertedIndexBenchmark.lookup:seeks          PARQUET_50000       LONG
>  1000000    ss  10000        30000.000
> #InvertedIndexBenchmark.lookup                PARQUET_50000       LONG
> 10000000    ss  10000         4979.031 ±       34.708
>  us/opInvertedIndexBenchmark.lookup:bytesRead      PARQUET_50000       LONG
>   10000000    ss  10000   7694887636.000
> #InvertedIndexBenchmark.lookup:openStreams    PARQUET_50000       LONG
> 10000000    ss  10000        10000.000
> #InvertedIndexBenchmark.lookup:seeks          PARQUET_50000       LONG
> 10000000    ss  10000        30000.000
> #InvertedIndexBenchmark.lookup                         MPHF       LONG
>  1000000    ss  10000          202.571 ±        2.336
>  us/opInvertedIndexBenchmark.lookup:bytesRead               MPHF       LONG
>    1000000    ss  10000   2821570000.000
> #InvertedIndexBenchmark.lookup:openStreams             MPHF       LONG
>  1000000    ss  10000        10000.000
> #InvertedIndexBenchmark.lookup:seeks                   MPHF       LONG
>  1000000    ss  10000        10000.000
> #InvertedIndexBenchmark.lookup                         MPHF       LONG
> 10000000    ss  10000         1086.957 ±        4.524
>  us/opInvertedIndexBenchmark.lookup:bytesRead               MPHF       LONG
>   10000000    ss  10000  28119460000.000
> #InvertedIndexBenchmark.lookup:openStreams             MPHF       LONG
> 10000000    ss  10000        10000.000
> #InvertedIndexBenchmark.lookup:seeks                   MPHF       LONG
> 10000000    ss  10000        10000.000
> #InvertedIndexBenchmark.write                  PARQUET_1000       LONG
>  1000000    ss      3      1720731.014 ±   876636.004
>  us/opInvertedIndexBenchmark.write:indexFileBytes   PARQUET_1000       LONG
>    1000000    ss      3     46453317.000
> #InvertedIndexBenchmark.write                  PARQUET_1000       LONG
> 10000000    ss      3     18547947.876 ± 12258125.307
>  us/opInvertedIndexBenchmark.write:indexFileBytes   PARQUET_1000       LONG
>   10000000    ss      3    452655675.000
> #InvertedIndexBenchmark.write                  PARQUET_5000       LONG
>  1000000    ss      3      1718345.583 ±  1103928.016
>  us/opInvertedIndexBenchmark.write:indexFileBytes   PARQUET_5000       LONG
>    1000000    ss      3     44845788.000
> #InvertedIndexBenchmark.write                  PARQUET_5000       LONG
> 10000000    ss      3     18604229.931 ±  2668361.915
>  us/opInvertedIndexBenchmark.write:indexFileBytes   PARQUET_5000       LONG
>   10000000    ss      3    435388818.000
> #InvertedIndexBenchmark.write                 PARQUET_10000       LONG
>  1000000    ss      3      1761555.389 ±   535857.675
>  us/opInvertedIndexBenchmark.write:indexFileBytes  PARQUET_10000       LONG
>    1000000    ss      3     44536635.000
> #InvertedIndexBenchmark.write                 PARQUET_10000       LONG
> 10000000    ss      3     19501588.264 ±  2130054.558
>  us/opInvertedIndexBenchmark.write:indexFileBytes  PARQUET_10000       LONG
>   10000000    ss      3    433189623.000
> #InvertedIndexBenchmark.write                 PARQUET_50000       LONG
>  1000000    ss      3      1936624.889 ±  6601363.985
>  us/opInvertedIndexBenchmark.write:indexFileBytes  PARQUET_50000       LONG
>    1000000    ss      3     44264655.000
> #InvertedIndexBenchmark.write                 PARQUET_50000       LONG
> 10000000    ss      3     20471742.278 ± 10705206.310
>  us/opInvertedIndexBenchmark.write:indexFileBytes  PARQUET_50000       LONG
>   10000000    ss      3    431311305.000
> #InvertedIndexBenchmark.write                          MPHF       LONG
>  1000000    ss      3       896573.958 ±  1408024.851
>  us/opInvertedIndexBenchmark.write:indexFileBytes           MPHF       LONG
>    1000000    ss      3    102846369.000
> #InvertedIndexBenchmark.write                          MPHF       LONG
> 10000000    ss      3      6509348.875 ± 15519975.479
>  us/opInvertedIndexBenchmark.write:indexFileBytes           MPHF       LONG
>   10000000    ss      3   1058435733.000                     #*
>
> huaxin gao <[email protected]> ezt írta (időpont: 2026. ápr. 21., K,
> 20:53):
>
>> Hi all,
>>
>> In recent secondary index sync meetings, the discussion converged on the
>> need to define what an index is from first principles before settling on
>> physical layout.
>>
>> To address that, Peter and I have drafted a requirements document for a
>> key lookup index (renamed from "primary key index" to avoid implying
>> uniqueness enforcement), the goal is to nail down one well-scoped index
>> type first.
>>
>> Doc: Key Lookup Index Requirements
>> <https://docs.google.com/document/d/1e0zxK-jA0LBDq8YQlQgFipTHelDFiga8lCkgDTmYub8/edit?tab=t.0#heading=h.8shrgabvl19>
>>
>> It covers requirements, three design options (manifest + sorted Parquet,
>> hash + sorted Parquet, hash + MPHF) and open questions. We will add
>> preliminary benchmark results shortly.
>>
>> Feedback welcome — inline in the doc, on this thread, or at the next
>> index sync.
>>
>> Thanks,
>>
>> Huaxin
>>
>> On Mon, Apr 13, 2026 at 7:22 AM Steven Wu <[email protected]> wrote:
>>
>>> Do we need the special index identifier that was originally proposed? A
>>> generic CatalogObjectIdentifier (with namespace and name) would be
>>> consistent with all object types in the catalog. I have a discussion thread
>>> on the generic identifier topic: [DISCUSS] REST Spec: generic
>>> CatalogObjectIdentifier.
>>>
>>> Should we add an indexes array field to table metadata? It only
>>> contains a list of index object identifiers. It doesn't contain any index
>>> metadata which should live in the index objects. Yufei was trying to bring
>>> this up at the end of the first sync. But we didn't get enough time to
>>> really discuss it. It will be great to discuss this as the first agenda
>>> item today.
>>>
>>> On Mon, Apr 13, 2026 at 3:17 AM Péter Váry <[email protected]>
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> We had several engaging discussions at the Iceberg Summit, and it was
>>>> great to finally catch up with many of you in person. We truly missed those
>>>> who couldn’t attend, hopefully we’ll all meet again at the next summit.
>>>>
>>>> To keep the conversation going, Huaxin and I have put together the
>>>> agenda for our next meeting. As a reminder, we’ll meet on *April 13th,
>>>> 9:00–10:00 AM *PDT (6:00–7:00 PM CEST).
>>>>
>>>> Proposed agenda:
>>>>
>>>>    - Continue first-principles index design discussion from Mar 30
>>>>       - *Index Ownership and Write Responsibility*
>>>>          - Should writers be allowed to update indexes, or
>>>>          - Should all index writes be handled exclusively by the Index
>>>>          Maintenance process?
>>>>          - If writers can update indexes then we need to define what
>>>>          guarantees are required (compaction, file splitting, layout 
>>>> expectations)?
>>>>          - If only Index Maintenance updates indexes then we only need
>>>>          to define what observable properties should be exposed to 
>>>> consumers? Like:
>>>>             - Expected max files for a single key
>>>>             - Current max files for a single key
>>>>             - Deletes allowed/present
>>>>             - Sorted by
>>>>             - Partitioned by
>>>>          - *Specification Scope: What Belongs in the Spec?*
>>>>          - Related to the ownership question above
>>>>          - Light spec: Just define that the index table should be
>>>>          optimized for retrieval by key columns and the index columns 
>>>> should be
>>>>          contained in the table. This could give us more flexibility if 
>>>> better
>>>>          organization methods come up, or
>>>>          - Detailed spec: We could define the max number of files per
>>>>          index to read for a single key, or even the partitioning and the 
>>>> exact sort
>>>>          order. This could allow more use-cases for a given index, like 
>>>> joins or
>>>>          cardinality estimations.
>>>>          - I would go for light spec for the main types (PK,
>>>>          Containing) and only the Index Maintenance processes should 
>>>> update the
>>>>          Indexes, as for many use-cases the details are not important, and 
>>>> writers
>>>>          will very rarely update the Indexes themselves.
>>>>       - *Logical Placement of Indexes*
>>>>          - Index as a child object of an Iceberg Table, or
>>>>          - Index as a first‑class entity under
>>>>          /namespace/indexes/{index}
>>>>          - Based on the discussions on the summit we are leaning in
>>>>          this direction. This means the index id should be unique in the 
>>>> namespace
>>>>          but helps the catalog implementations quite a bit
>>>>       - *Physical Placement of Index Data*
>>>>          - I don’t think we should specify this. We should have a base
>>>>          location for the index, but can rely on the catalog 
>>>> implementations to
>>>>          decide on their own, like they do with the tables, views, udfs.
>>>>       - *Iceberg Reader Based indexes* (Containing indexes and
>>>>       potentially PK indexes). These are the indexes which could be read 
>>>> by the
>>>>       existing Iceberg readers. We might decide to store the PK index 
>>>> similarly
>>>>       to an Iceberg Table and treat it as a reader based index.
>>>>          - What are the table properties/features exposed to the
>>>>          readers
>>>>             - Maybe just some behavioral descriptors for the optimizer
>>>>             to decide if the index could be used or should be skipped, 
>>>> like:
>>>>                - Expected max files for a single key
>>>>                - max files for a single key
>>>>                - Deletes allowed/present
>>>>                - Sorted by
>>>>                - Partitioned by
>>>>             - The Tasks when reading the index based on the filters
>>>>             and projection
>>>>          - What are the table properties/features exposed to the Index
>>>>          Maintenance. I think this could be internal to the Index 
>>>> Maintenance
>>>>          process and might not be exposed through the spec. The Index 
>>>> Maintenance
>>>>          process could handle this as a standard Iceberg Table and could 
>>>> be based on
>>>>          the Table Maintenance process, but there might be some totally 
>>>> different
>>>>          processes.
>>>>       - It should be possible to add properties to an index defined by
>>>>       the Index Maintenance process which could be used and updated in the 
>>>> next
>>>>       Index Maintenance run.
>>>>    - *PK index storage format benchmark results*
>>>>       - Flat Parquet (baseline)
>>>>       - BTree with Parquet leaves
>>>>       - Vortex
>>>>    - *Open items / next steps*
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>> huaxin gao <[email protected]> ezt írta (időpont: 2026. márc.
>>>> 23., H, 3:03):
>>>>
>>>>> Hi everyone, I wanted to share an update on the primary key index work.
>>>>> Since there are still open questions on whether bloom filter indexes
>>>>> fit in the secondary index framework or should be treated as extended
>>>>> stats, I've shifted focus to the primary key index since it's a clearer 
>>>>> fit
>>>>> for the framework.
>>>>> I've put together a proposal for a primary key reverse-lookup index
>>>>> that maps each key to its physical location (file_path, row_position). It
>>>>> enables:
>>>>>
>>>>>    - Scan-time file pruning for point lookups
>>>>>    - Converting key-based deletes into position deletes (eliminating
>>>>>    equality deletes for Flink CDC)
>>>>>    - Accelerating Spark MERGE INTO by replacing full-table joins with
>>>>>    direct file lookups
>>>>>
>>>>> Proposal:
>>>>> https://docs.google.com/document/d/1HuhCZ0n2FqDh8yqQb9oEj1CPM5yXpEsMPGZno2aSf8E/edit?tab=t.0#heading=h.tbevg4q0m9
>>>>> Feedback welcome!
>>>>> Thanks,
>>>>> Huaxin
>>>>>
>>>>> On Wed, Mar 18, 2026 at 11:42 PM Péter Váry <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Key takeaways from the general index discussion at the May 16 meeting.
>>>>>> Thanks to everyone who participated! The recording is available here:
>>>>>> https://www.youtube.com/watch?v=btmjhtRWUCE
>>>>>>
>>>>>>    - Q: Do we need to tie index types to the algorithms used to
>>>>>>    access them?
>>>>>>    - A: From a specification perspective, the goal is to define the
>>>>>>    storage-level data layout so it can be shared across engines. Engines 
>>>>>> are
>>>>>>    free to interpret and use the data as they see fit, but the on-disk 
>>>>>> data
>>>>>>    layout itself must be strictly defined and interoperable.
>>>>>>
>>>>>>    - Q: Should we introduce an additional abstraction layer (e.g.,
>>>>>>    Vector Index) with sub-types such as IVF and DiskANN?
>>>>>>    - A: This is possible if we decide it is beneficial. I explored
>>>>>>    potential naming, but it is not yet clear how such a layer would be 
>>>>>> used in
>>>>>>    practice.
>>>>>>    *Question to Yingyi Bu*: could you provide examples where this
>>>>>>    additional layer would be useful? Should this abstraction be defined 
>>>>>> at the
>>>>>>    spec level, or is it better handled at the engine level?
>>>>>>    My initial idea was that users would create a generic Vector
>>>>>>    Index and let the engine choose the concrete implementation. However, 
>>>>>> this
>>>>>>    would limit user control and users likely need to specify the exact 
>>>>>> index
>>>>>>    representation, which implies they must be aware of the available
>>>>>>    representations.
>>>>>>
>>>>>>
>>>>>>
>>>>>>    - Q: Do we want to allow extensibility for index types?
>>>>>>    - A: Yes. The intent is to support a small set of well-defined
>>>>>>    index types while allowing experimentation with new ones. If a new 
>>>>>> index
>>>>>>    type proves broadly useful, a follow-up proposal can standardize it 
>>>>>> and
>>>>>>    incorporate it into the spec.
>>>>>>
>>>>>>
>>>>>>
>>>>>>    - Q: Do we allow multiple versions of an index for the same table
>>>>>>    snapshot?
>>>>>>    - A: Yes. Older index versions must be retained for readers that
>>>>>>    have already started using them, while new readers should 
>>>>>> automatically use
>>>>>>    the latest available version
>>>>>>
>>>>>>
>>>>>>
>>>>>>    - Q: Do we need to use materialized views for these indexes?
>>>>>>    - A: No. These indexes are primarily examples, and different
>>>>>>    types may require different storage methods. However, the Primary Key,
>>>>>>    Containing, and parts of the IVF indexes can be structured as Iceberg
>>>>>>    tables. This allows engines to read them natively; in some cases, 
>>>>>> Iceberg
>>>>>>    planners can automatically redirect queries to the index table without
>>>>>>    engine modifications. Furthermore, index maintenance for these tables 
>>>>>> can
>>>>>>    leverage existing materialized view maintenance workflows. Other index
>>>>>>    types may instead rely on Puffin files or alternative storage 
>>>>>> approaches.
>>>>>>
>>>>>>
>>>>>>
>>>>>>    - Q: How should index metadata be accessed? Should we add
>>>>>>    explicit pointers for the indexes in the table metadata?
>>>>>>    - A: We did not have sufficient time to fully explore and
>>>>>>    conclude this topic.
>>>>>>    *Question for Yufei Gu*: Did I understand correctly that your
>>>>>>    main concern stems from endpoint resolution from a REST Catalog
>>>>>>    perspective? Specifically, if indexes are exposed under a URI such as
>>>>>>    v1/{prefix}/namespaces/{namespace}/tables/{table}/indexes/{index}, 
>>>>>> would
>>>>>>    this make it more difficult for the REST Catalog to resolve and route
>>>>>>    requests to the appropriate endpoint?
>>>>>>
>>>>>>
>>>>>> Suhas Jayaram Subramanya via dev <[email protected]> ezt írta
>>>>>> (időpont: 2026. márc. 13., P, 23:32):
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Here's a proposal for native Vector Index support in Iceberg tables
>>>>>>> --
>>>>>>> https://docs.google.com/document/d/1KL4qLOwdqnhOcqTc0EjO1O16NV3M3c-gZCEINDWw4lA/edit?usp=sharing
>>>>>>>
>>>>>>> We've been working on this proposal with Peter internally at
>>>>>>> Microsoft and he suggested we post it here to bring this to the 
>>>>>>> community's
>>>>>>> attention, ahead of the next Secondary Index Sync.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Suhas
>>>>>>>
>>>>>>> On 2026/02/19 04:34:34 huaxin gao wrote:
>>>>>>> > Hi Everyone,
>>>>>>> >
>>>>>>> > Here are the recording and notes from the Iceberg Index Support
>>>>>>> Sync on
>>>>>>> > 2/11.
>>>>>>> >
>>>>>>> > Recording: https://www.youtube.com/watch?v=3sFfQ0A50yk
>>>>>>> >
>>>>>>> > Notes:
>>>>>>> >
>>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.8041k7j2n7y3
>>>>>>> >
>>>>>>> > The meeting will move to biweekly, Mondays 9–10am PST, starting
>>>>>>> March 2.
>>>>>>> >
>>>>>>> > Since the sync, I updated the Bloom skipping index proposal
>>>>>>> > <
>>>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.5r5kl6k3fqwu
>>>>>>> >
>>>>>>> > to address the discussion questions, specifically:
>>>>>>> >
>>>>>>> >
>>>>>>> > - Performance justification: when this helps (high-cardinality = /
>>>>>>> IN,
>>>>>>> > many data files, high object-store latency) and how it differs
>>>>>>> from Parquet
>>>>>>> > row-group Bloom filters (which still require opening the data
>>>>>>> file).
>>>>>>> > - Cost / scalability: rough sizing (Bloom blob size per file,
>>>>>>> Puffin
>>>>>>> > file size), the planning cost trade-off (driver index reads vs
>>>>>>> executor
>>>>>>> > file opens), and mitigations via caching.
>>>>>>> > - Lifecycle / maintenance: incremental production as new data files
>>>>>>> > arrive, behavior when the index is missing/behind, and
>>>>>>> sharding/compaction
>>>>>>> > plus cleanup to avoid accumulating too many small Puffin files
>>>>>>> over time.
>>>>>>> > - Writer expectations: inline (optional) vs asynchronous (primary)
>>>>>>> index
>>>>>>> > creation.
>>>>>>> >
>>>>>>> > I also implemented a Spark 4.1 POC
>>>>>>> > <https://github.com/apache/iceberg/pull/15311> and a local
>>>>>>> benchmark to
>>>>>>> > quantify both the pruning impact (plannedFiles → afterBloom) and
>>>>>>> the index
>>>>>>> > read overhead (statsFiles, statsBytes, bloomPayloadBytes) for point
>>>>>>> > predicates on high-cardinality columns. Please take a look and let
>>>>>>> me know
>>>>>>> > if you have any questions or feedback.
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> >
>>>>>>> > Huaxin
>>>>>>> >
>>>>>>> > On Tue, Feb 10, 2026 at 1:43 PM huaxin gao <[email protected]>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > > Reminder for tomorrow's sync on Iceberg Index Support.
>>>>>>> > >
>>>>>>> > > Wednesday: Feb. 11 9:00 – 10:00am
>>>>>>> > > Time zone: America/Los_Angeles
>>>>>>> > > Google Meet joining info
>>>>>>> > > Video call link: meet.google.com/nsp-ctyr-khk
>>>>>>> > > Design doc:
>>>>>>> > >
>>>>>>> > >
>>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2
>>>>>>> > >
>>>>>>> > >
>>>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7
>>>>>>> > >
>>>>>>> > > Thanks,
>>>>>>> > > Huaxin
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > On Tue, Feb 3, 2026 at 10:52 PM Péter Váry <[email protected]>
>>>>>>> > > wrote:
>>>>>>> > >
>>>>>>> > >> Thanks Huaxin and Steven for organizing this. Looking forward
>>>>>>> to meet you
>>>>>>> > >> all next week!
>>>>>>> > >>
>>>>>>> > >> On Wed, Feb 4, 2026, 02:48 Steven Wu <[email protected]> wrote:
>>>>>>> > >>
>>>>>>> > >>> We set up the dev calendar event with a new google meet link.
>>>>>>> Please
>>>>>>> > >>> ignore the link from Huaxin's original email.
>>>>>>> > >>>
>>>>>>> > >>> The dev calendar has the correct info (including the new
>>>>>>> meeting link)
>>>>>>> > >>>
>>>>>>> > >>> Iceberg Index Support Sync
>>>>>>> > >>> Wednesday, February 11 · 9:00 – 10:00am
>>>>>>> > >>> Time zone: America/Los_Angeles
>>>>>>> > >>> Google Meet joining info
>>>>>>> > >>> Video call link: https://meet.google.com/nsp-ctyr-khk
>>>>>>> > >>>
>>>>>>> > >>> On Tue, Feb 3, 2026 at 5:08 PM huaxin gao <[email protected]>
>>>>>>> > >>> wrote:
>>>>>>> > >>>
>>>>>>> > >>>> Sorry, I meant PST (not EST) :)
>>>>>>> > >>>> Looking forward to the discussion!
>>>>>>> > >>>>
>>>>>>> > >>>> On Tue, Feb 3, 2026 at 4:58 PM Shawn Chang <[email protected]>
>>>>>>> > >>>> wrote:
>>>>>>> > >>>>
>>>>>>> > >>>>> Hi Huaxin,
>>>>>>> > >>>>>
>>>>>>> > >>>>> Thanks for starting the sync!
>>>>>>> > >>>>>
>>>>>>> > >>>>> The meeting seems to be 9-10AM PST on the dev events calendar
>>>>>>> > >>>>> <
>>>>>>> https://calendar.google.com/calendar/u/0?cid=MzkwNWQ0OTJmMWI0NTBiYTA3MTJmMmFlNmFmYTc2ZWI3NTdmMTNkODUyMjBjYzAzYWE0NTI3ODg1YWRjNTYyOUBncm91cC5jYWxlbmRhci5nb29nbGUuY29t
>>>>>>> >,
>>>>>>> > >>>>> not EST. Maybe it's a typo?
>>>>>>> > >>>>> Otherwise, looking forward to the discussion!
>>>>>>> > >>>>>
>>>>>>> > >>>>> Best,
>>>>>>> > >>>>> Shawn
>>>>>>> > >>>>>
>>>>>>> > >>>>> On Tue, Feb 3, 2026 at 9:18 AM huaxin gao <[email protected]>
>>>>>>> > >>>>> wrote:
>>>>>>> > >>>>>
>>>>>>> > >>>>>> Hi all,
>>>>>>> > >>>>>> I'd like to start a dedicated sync to discuss Iceberg Index
>>>>>>> support.
>>>>>>> > >>>>>> Here is the existing discussion thread:
>>>>>>> > >>>>>>
>>>>>>> https://lists.apache.org/thread/fzqk3jjf0xpj5m4cfqb3v4c65p0t04ty.
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> To ground the discussion, here are the two proposals:
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> - Peter's proposal
>>>>>>> > >>>>>> <
>>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2>
>>>>>>> (overall
>>>>>>> > >>>>>> index support)
>>>>>>> > >>>>>> - My proposal
>>>>>>> > >>>>>> <
>>>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7
>>>>>>> >
>>>>>>> > >>>>>> (bloom filter skipping index)
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> Time slot: Every 3 weeks, Wednesdays at 9 AM to 10 AM EST,
>>>>>>> starting
>>>>>>> > >>>>>> next Wednesday (2/11). After FileFormat sync finishes, we
>>>>>>> plan to use that
>>>>>>> > >>>>>> slot and switch to every other Monday, 9 AM to 10 AM EST.
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> Meet link: https://meet.google.com/fjn-tyze-mko
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> Thanks,
>>>>>>> > >>>>>> Huaxin
>>>>>>> > >>>>>>
>>>>>>> > >>>>>
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>

Re: Re: Dedicated sync for Iceberg Index Support

Reply via email to