I agree Cheers, Tushar Choudhary
On Wed, 13 Aug 2025 at 10:39 PM, Kevin Liu <kevinjq...@apache.org> wrote: > Hey everyone, > > As discussed on the community sync today, there's enough interest around > AAL, S3FileIO, and FileIO in general that we would like to schedule an ad > hoc sync for this topic. > > I'll work with Michael to find a suitable time. We'll add it to the "Iceberg > Dev Events" > <https://iceberg.apache.org/community/#apache-iceberg-community-calendar> and > also post on the devlist when we figure out more details. > > Best, > Kevin Liu > > On Mon, Aug 11, 2025 at 8:00 AM Kevin Liu <kevinjq...@apache.org> wrote: > >> Let's bring this up in the next community sync on Wed 8/13 and see if it >> warrants another adhoc sync about FileIO. >> >> Best, >> Kevin Liu >> >> On Wed, Aug 6, 2025 at 1:21 PM Stubbs, Michael >> <michs...@amazon.co.uk.invalid> wrote: >> >>> I think Kevin has raised a few good points on the future of FileIO and >>> the maintainability of the project going forwarded with the AAL default on >>> Proposal. >>> >>> I think we should schedule community sync about this. >>> >>> Thank you! >>> >>> >>> >>> On 2025/07/31 13:12:48 Steve Loughran wrote: >>> >>> > On Fri, 25 Jul 2025 at 17:28, Kevin Liu <ke...@apache.org> wrote: >>> >>> > >>> >>> > *> I think it would be great to also make these improvements available >>> to >>> >>> > older Iceberg clients.* >>> >>> > >>> >>> > Use the S3A connector and turn on vector reads through parquert and you >>> >>> > currently get the same performance, about at 30% speedup in TPC >>> benchmarks >>> >>> > (I know, but what else do we have?). S3A connector is going to to move >>> to >>> >>> > making the AAL input stream the default in a future release because >>> it's a >>> >>> > better architeture overall. >>> >>> > >>> >>> > the vector IO stuff can also do speedup on azire and abfs if their >>> >>> > connectors support it. (oh and local fs too, FWIW). scatter/gather IO >>> for >>> >>> > the win. >>> >>> > >>> >>> > what AAL adds is format awareness, which could allow for extra >>> >>> > opportunities. >>> >>> > >>> >>> > As an aside, it'd be really good if FileIO added an overload >>> newInputFile() >>> >>> > api call which passed in the file type too, so that AAL &c would know >>> what >>> >>> > type to optimise for, rather than just guess of the extension. knowing >>> what >>> >>> > the v1 and maybe v2 schema offsets would save that GET request on the >>> >>> > footter. AAL does a GET of a range at the bottom with the goal of >>> including >>> >>> > the schema, but knowing the exact range would be better. >>> >>> > >>> >>> > *> BTW, we have one-off community syncs about specific topics, I would >>> be >>> >>> > interested to talk more about this as well as other FileIOs. We use the >>> >>> > "Iceberg Dev Events" calendar for scheduling if there's interest.* >>> >>> > >>> >>> > I"d like that too. >>> >>> > >>> >>> > >>> >>> > > >>> >>> > >>> >>