Thanks Huaxin for posting the recording and the meeting notes.

I used this time to also address the questions collected during the sync:

   - Collected some representative use cases. See the example use-cases
   
<https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.i4gt8za99j9d>
paragraph.
   Anyone should feel free to suggest their own.
   - Collected my thoughts about the writer requirements. See the writer
   requirements
   
<https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.4b1p8r8nmfg1>
   paragraph.
   - Centralized the index maintenance related parts. See the index
   maintenance
   
<https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.hw2nt44i0k8q>
   paragraph.

Might be a bit premature but created a PR
<https://github.com/apache/iceberg/pull/15101> with the proposed index
catalog related changes, so the ones who are more code oriented could take
a look at it too.

huaxin gao <[email protected]> ezt írta (időpont: 2026. febr. 19., Cs,
5:34):

> Hi Everyone,
>
> Here are the recording and notes from the Iceberg Index Support Sync on
> 2/11.
>
> Recording: https://www.youtube.com/watch?v=3sFfQ0A50yk
>
> Notes:
> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.8041k7j2n7y3
>
> The meeting will move to biweekly, Mondays 9–10am PST, starting March 2.
>
> Since the sync, I updated the Bloom skipping index proposal
> <https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.5r5kl6k3fqwu>
> to address the discussion questions, specifically:
>
>
>    - Performance justification: when this helps (high-cardinality = / IN,
>    many data files, high object-store latency) and how it differs from Parquet
>    row-group Bloom filters (which still require opening the data file).
>    - Cost / scalability: rough sizing (Bloom blob size per file, Puffin
>    file size), the planning cost trade-off (driver index reads vs executor
>    file opens), and mitigations via caching.
>    - Lifecycle / maintenance: incremental production as new data files
>    arrive, behavior when the index is missing/behind, and sharding/compaction
>    plus cleanup to avoid accumulating too many small Puffin files over time.
>    - Writer expectations: inline (optional) vs asynchronous (primary)
>    index creation.
>
> I also implemented a Spark 4.1 POC
> <https://github.com/apache/iceberg/pull/15311> and a local benchmark to
> quantify both the pruning impact (plannedFiles → afterBloom) and the index
> read overhead (statsFiles, statsBytes, bloomPayloadBytes) for point
> predicates on high-cardinality columns. Please take a look and let me know
> if you have any questions or feedback.
>
> Thanks,
>
> Huaxin
>
> On Tue, Feb 10, 2026 at 1:43 PM huaxin gao <[email protected]> wrote:
>
>> Reminder for tomorrow's sync on Iceberg Index Support.
>>
>> Wednesday: Feb. 11 9:00 – 10:00am
>> Time zone: America/Los_Angeles
>> Google Meet joining info
>> Video call link: meet.google.com/nsp-ctyr-khk
>> Design doc:
>>
>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2
>>
>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7
>>
>> Thanks,
>> Huaxin
>>
>>
>> On Tue, Feb 3, 2026 at 10:52 PM Péter Váry <[email protected]>
>> wrote:
>>
>>> Thanks Huaxin and Steven for organizing this. Looking forward to meet
>>> you all next week!
>>>
>>> On Wed, Feb 4, 2026, 02:48 Steven Wu <[email protected]> wrote:
>>>
>>>> We set up the dev calendar event with a new google meet link. Please
>>>> ignore the link from Huaxin's original email.
>>>>
>>>> The dev calendar has the correct info (including the new meeting link)
>>>>
>>>> Iceberg Index Support Sync
>>>> Wednesday, February 11 · 9:00 – 10:00am
>>>> Time zone: America/Los_Angeles
>>>> Google Meet joining info
>>>> Video call link: https://meet.google.com/nsp-ctyr-khk
>>>>
>>>> On Tue, Feb 3, 2026 at 5:08 PM huaxin gao <[email protected]>
>>>> wrote:
>>>>
>>>>> Sorry, I meant PST (not EST) :)
>>>>> Looking forward to the discussion!
>>>>>
>>>>> On Tue, Feb 3, 2026 at 4:58 PM Shawn Chang <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Huaxin,
>>>>>>
>>>>>> Thanks for starting the sync!
>>>>>>
>>>>>> The meeting seems to be 9-10AM PST on the dev events calendar
>>>>>> <https://calendar.google.com/calendar/u/0?cid=MzkwNWQ0OTJmMWI0NTBiYTA3MTJmMmFlNmFmYTc2ZWI3NTdmMTNkODUyMjBjYzAzYWE0NTI3ODg1YWRjNTYyOUBncm91cC5jYWxlbmRhci5nb29nbGUuY29t>,
>>>>>> not EST. Maybe it's a typo?
>>>>>> Otherwise, looking forward to the discussion!
>>>>>>
>>>>>> Best,
>>>>>> Shawn
>>>>>>
>>>>>> On Tue, Feb 3, 2026 at 9:18 AM huaxin gao <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>> I'd like to start a dedicated sync to discuss Iceberg Index support.
>>>>>>> Here is the existing discussion thread:
>>>>>>> https://lists.apache.org/thread/fzqk3jjf0xpj5m4cfqb3v4c65p0t04ty.
>>>>>>>
>>>>>>> To ground the discussion, here are the two proposals:
>>>>>>>
>>>>>>>    - Peter's proposal
>>>>>>>    
>>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2>
>>>>>>>  (overall
>>>>>>>    index support)
>>>>>>>    - My proposal
>>>>>>>    
>>>>>>> <https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7>
>>>>>>>    (bloom filter skipping index)
>>>>>>>
>>>>>>> Time slot: Every 3 weeks, Wednesdays at 9 AM to 10 AM EST, starting
>>>>>>> next Wednesday (2/11). After FileFormat sync finishes, we plan to use 
>>>>>>> that
>>>>>>> slot and switch to every other Monday, 9 AM to 10 AM EST.
>>>>>>>
>>>>>>> Meet link: https://meet.google.com/fjn-tyze-mko
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Huaxin
>>>>>>>
>>>>>>

Reply via email to