Hey everyone, As discussed on the community sync today, there's enough interest around AAL, S3FileIO, and FileIO in general that we would like to schedule an ad hoc sync for this topic.
I'll work with Michael to find a suitable time. We'll add it to the "Iceberg Dev Events" <https://iceberg.apache.org/community/#apache-iceberg-community-calendar> and also post on the devlist when we figure out more details. Best, Kevin Liu On Mon, Aug 11, 2025 at 8:00 AM Kevin Liu <kevinjq...@apache.org> wrote: > Let's bring this up in the next community sync on Wed 8/13 and see if it > warrants another adhoc sync about FileIO. > > Best, > Kevin Liu > > On Wed, Aug 6, 2025 at 1:21 PM Stubbs, Michael > <michs...@amazon.co.uk.invalid> wrote: > >> I think Kevin has raised a few good points on the future of FileIO and >> the maintainability of the project going forwarded with the AAL default on >> Proposal. >> >> I think we should schedule community sync about this. >> >> Thank you! >> >> >> >> On 2025/07/31 13:12:48 Steve Loughran wrote: >> >> > On Fri, 25 Jul 2025 at 17:28, Kevin Liu <ke...@apache.org> wrote: >> >> > >> >> > *> I think it would be great to also make these improvements available >> to >> >> > older Iceberg clients.* >> >> > >> >> > Use the S3A connector and turn on vector reads through parquert and you >> >> > currently get the same performance, about at 30% speedup in TPC >> benchmarks >> >> > (I know, but what else do we have?). S3A connector is going to to move >> to >> >> > making the AAL input stream the default in a future release because >> it's a >> >> > better architeture overall. >> >> > >> >> > the vector IO stuff can also do speedup on azire and abfs if their >> >> > connectors support it. (oh and local fs too, FWIW). scatter/gather IO >> for >> >> > the win. >> >> > >> >> > what AAL adds is format awareness, which could allow for extra >> >> > opportunities. >> >> > >> >> > As an aside, it'd be really good if FileIO added an overload >> newInputFile() >> >> > api call which passed in the file type too, so that AAL &c would know >> what >> >> > type to optimise for, rather than just guess of the extension. knowing >> what >> >> > the v1 and maybe v2 schema offsets would save that GET request on the >> >> > footter. AAL does a GET of a range at the bottom with the goal of >> including >> >> > the schema, but knowing the exact range would be better. >> >> > >> >> > *> BTW, we have one-off community syncs about specific topics, I would >> be >> >> > interested to talk more about this as well as other FileIOs. We use the >> >> > "Iceberg Dev Events" calendar for scheduling if there's interest.* >> >> > >> >> > I"d like that too. >> >> > >> >> > >> >> > > >> >> > >> >