On Fri, 25 Jul 2025 at 17:28, Kevin Liu <kevinjq...@apache.org> wrote:
*> I think it would be great to also make these improvements available to older Iceberg clients.* Use the S3A connector and turn on vector reads through parquert and you currently get the same performance, about at 30% speedup in TPC benchmarks (I know, but what else do we have?). S3A connector is going to to move to making the AAL input stream the default in a future release because it's a better architeture overall. the vector IO stuff can also do speedup on azire and abfs if their connectors support it. (oh and local fs too, FWIW). scatter/gather IO for the win. what AAL adds is format awareness, which could allow for extra opportunities. As an aside, it'd be really good if FileIO added an overload newInputFile() api call which passed in the file type too, so that AAL &c would know what type to optimise for, rather than just guess of the extension. knowing what the v1 and maybe v2 schema offsets would save that GET request on the footter. AAL does a GET of a range at the bottom with the goal of including the schema, but knowing the exact range would be better. *> BTW, we have one-off community syncs about specific topics, I would be interested to talk more about this as well as other FileIOs. We use the "Iceberg Dev Events" calendar for scheduling if there's interest.* I"d like that too. >