Hi Everyone,

Here are the recording and notes from the Iceberg Index Support Sync on
2/11.

Recording: https://www.youtube.com/watch?v=3sFfQ0A50yk

Notes:
https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.8041k7j2n7y3

The meeting will move to biweekly, Mondays 9–10am PST, starting March 2.

Since the sync, I updated the Bloom skipping index proposal
<https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.5r5kl6k3fqwu>
to address the discussion questions, specifically:


   - Performance justification: when this helps (high-cardinality = / IN,
   many data files, high object-store latency) and how it differs from Parquet
   row-group Bloom filters (which still require opening the data file).
   - Cost / scalability: rough sizing (Bloom blob size per file, Puffin
   file size), the planning cost trade-off (driver index reads vs executor
   file opens), and mitigations via caching.
   - Lifecycle / maintenance: incremental production as new data files
   arrive, behavior when the index is missing/behind, and sharding/compaction
   plus cleanup to avoid accumulating too many small Puffin files over time.
   - Writer expectations: inline (optional) vs asynchronous (primary) index
   creation.

I also implemented a Spark 4.1 POC
<https://github.com/apache/iceberg/pull/15311> and a local benchmark to
quantify both the pruning impact (plannedFiles → afterBloom) and the index
read overhead (statsFiles, statsBytes, bloomPayloadBytes) for point
predicates on high-cardinality columns. Please take a look and let me know
if you have any questions or feedback.

Thanks,

Huaxin

On Tue, Feb 10, 2026 at 1:43 PM huaxin gao <[email protected]> wrote:

> Reminder for tomorrow's sync on Iceberg Index Support.
>
> Wednesday: Feb. 11 9:00 – 10:00am
> Time zone: America/Los_Angeles
> Google Meet joining info
> Video call link: meet.google.com/nsp-ctyr-khk
> Design doc:
>
> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2
>
> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7
>
> Thanks,
> Huaxin
>
>
> On Tue, Feb 3, 2026 at 10:52 PM Péter Váry <[email protected]>
> wrote:
>
>> Thanks Huaxin and Steven for organizing this. Looking forward to meet you
>> all next week!
>>
>> On Wed, Feb 4, 2026, 02:48 Steven Wu <[email protected]> wrote:
>>
>>> We set up the dev calendar event with a new google meet link. Please
>>> ignore the link from Huaxin's original email.
>>>
>>> The dev calendar has the correct info (including the new meeting link)
>>>
>>> Iceberg Index Support Sync
>>> Wednesday, February 11 · 9:00 – 10:00am
>>> Time zone: America/Los_Angeles
>>> Google Meet joining info
>>> Video call link: https://meet.google.com/nsp-ctyr-khk
>>>
>>> On Tue, Feb 3, 2026 at 5:08 PM huaxin gao <[email protected]>
>>> wrote:
>>>
>>>> Sorry, I meant PST (not EST) :)
>>>> Looking forward to the discussion!
>>>>
>>>> On Tue, Feb 3, 2026 at 4:58 PM Shawn Chang <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Huaxin,
>>>>>
>>>>> Thanks for starting the sync!
>>>>>
>>>>> The meeting seems to be 9-10AM PST on the dev events calendar
>>>>> <https://calendar.google.com/calendar/u/0?cid=MzkwNWQ0OTJmMWI0NTBiYTA3MTJmMmFlNmFmYTc2ZWI3NTdmMTNkODUyMjBjYzAzYWE0NTI3ODg1YWRjNTYyOUBncm91cC5jYWxlbmRhci5nb29nbGUuY29t>,
>>>>> not EST. Maybe it's a typo?
>>>>> Otherwise, looking forward to the discussion!
>>>>>
>>>>> Best,
>>>>> Shawn
>>>>>
>>>>> On Tue, Feb 3, 2026 at 9:18 AM huaxin gao <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>> I'd like to start a dedicated sync to discuss Iceberg Index support.
>>>>>> Here is the existing discussion thread:
>>>>>> https://lists.apache.org/thread/fzqk3jjf0xpj5m4cfqb3v4c65p0t04ty.
>>>>>>
>>>>>> To ground the discussion, here are the two proposals:
>>>>>>
>>>>>>    - Peter's proposal
>>>>>>    
>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2>
>>>>>>  (overall
>>>>>>    index support)
>>>>>>    - My proposal
>>>>>>    
>>>>>> <https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7>
>>>>>>    (bloom filter skipping index)
>>>>>>
>>>>>> Time slot: Every 3 weeks, Wednesdays at 9 AM to 10 AM EST, starting
>>>>>> next Wednesday (2/11). After FileFormat sync finishes, we plan to use 
>>>>>> that
>>>>>> slot and switch to every other Monday, 9 AM to 10 AM EST.
>>>>>>
>>>>>> Meet link: https://meet.google.com/fjn-tyze-mko
>>>>>>
>>>>>> Thanks,
>>>>>> Huaxin
>>>>>>
>>>>>

Reply via email to