Hey Daniel, I am working on reducing GET calls for small files. Currently, we are doing 3 GET (2 for footer + 1 for actual data) when we can easily work with 1 GET call as the file is small enough (expecting small files <= 1mb) to get the whole file altogether then use it from buffer for footer and data read both.
I have implemented an approach which is saving around 66% (537s to 169s) when run in JMH benchmarks, 1000 files (not partitioned) & total 20,000,000 rows. PR - https://github.com/apache/iceberg/pull/16729 I've started a new thread at https://lists.apache.org/thread/yb8nom3w2zplb703m0p052kcc1wwotrr connecting this to the parquet-mr discussion (arrow-rs already exposes footer size hints that parquet-mr doesn't). Would appreciate your thoughts there Can you please look at https://lists.apache.org/thread/yb8nom3w2zplb703m0p052kcc1wwotrr -- Lakhyani Varun Indian Institute of Technology Roorkee Contact: +91 96246 46174
