On Fri, 25 Jul 2025 at 17:28, Kevin Liu <kevinjq...@apache.org> wrote:

*> I think it would be great to also make these improvements available to
older Iceberg clients.*

Use the S3A connector and turn on vector reads through parquert and you
currently get the same performance, about at 30% speedup in TPC benchmarks
(I know, but what else do we have?). S3A connector is going to to move to
making the AAL input stream the default in a future release because it's a
better architeture overall.

the vector IO stuff can also do speedup on azire and abfs if their
connectors support it. (oh and local fs too, FWIW). scatter/gather IO for
the win.

what AAL adds is format awareness, which could allow for extra
opportunities.

As an aside, it'd be really good if FileIO added an overload newInputFile()
api call which passed in the file type too, so that AAL &c would know what
type to optimise for, rather than just guess of the extension. knowing what
the v1 and maybe v2 schema offsets would save that GET request on the
footter. AAL does a GET of a range at the bottom with the goal of including
the schema, but knowing the exact range would be better.

*> BTW, we have one-off community syncs about specific topics, I would be
interested to talk more about this as well as other FileIOs. We use the
"Iceberg Dev Events" calendar for scheduling if there's interest.*

I"d like that too.


>

Reply via email to