Thanks for the nice idea/suggestion, Dan. Yes, we have been employing a similar technique that you noted below and kinda arrived at the conclusion that there is no deterministic way to achieve that most optimal situation, ie., single i/o call to S3 to read the parquet footer.
Best, Sreeram On Tue, Jan 21, 2025 at 4:20 PM Daniel Weeks <dwe...@apache.org> wrote: > Hey Sreeram, > > I think it's worthwhile to consider what value would be added by tracking > the footer size in metadata, but there are other options to address these > optimization use cases. > > For example, if you take a look at the RangeReadable > <https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/io/RangeReadable.java#L68> > interface > for FileIO implementations, there's a readTail method so that you can > optimistically read from the tail end of the file to try to fetch the full > footer in a single read. This is even optimized in some of the > implementations (like S3InputStream) to leverage backward reads as opposed > to seek operations which might have overhead. > > Depending on the size of the file, you may want to load just the tail or > the whole file to avoid all reads. Having the exact value definitely will > make this more exact, but I feel like using the above approach can > approximate the same performance benefits. > > Just a thought, > -Dan > > On Tue, Jan 21, 2025 at 12:17 PM Sreeram Garlapati < > gsreeramku...@gmail.com> wrote: > >> Hello Team! >> >> This is a small improvement proposal to store the *parquet footer size* >> as part of the *data_file* metadata in the iceberg manifest >> <https://iceberg.apache.org/spec/#manifests>. >> *manifest_entry > (2) data_file > (146 Optional) >> footer_size_in_bytes* >> >> *Motivation*: >> >> - We have several sub-second read use cases on iceberg tables. We >> store icebergs and parquets on S3. Every hop to S3 is v.expensive (P99 of >> >200 milliseconds). Hence we are trying to see if we can optimize by >> cutting down any of these hops. One such hop is during the Parquet file >> read., the first read to the parquet, which is to read the last 8 bytes - >> to read the - footer size and par1 sequence. >> - Iceberg metadata already includes the file_size_in_bytes. Including >> the footer size benefits all the readers. ie., readers can directly issue >> 1 >> I/O call to read the footer - *read_parquet_footer(filehandle, >> offset=file_size_in_bytes-footer_size_in_bytes-1)* >> - This is similar to what we have in the iceberg specification in the >> case of storing Table statistics >> <https://iceberg.apache.org/spec/#table-statistics>, puffins > >> *file-footer-size-in-bytes*. >> - This can be easily extended to ORC as needed too. Perhaps, in the >> ORC case, an additional property to store the postscript length is also >> needed. >> >> Truly appreciate your thoughts, >> Sreeram <https://www.linkedin.com/in/sreeramgarlapati> >> >>