Re: Proposal: Parquet footer size in Iceberg metadata

Sreeram Garlapati Tue, 21 Jan 2025 19:39:58 -0800

Thanks for the nice idea/suggestion, Dan.
Yes, we have been employing a similar technique that you noted below and
kinda arrived at the conclusion that there is no deterministic way to
achieve that most optimal situation, ie., single i/o call to S3 to read the
parquet footer.


Best,
Sreeram

On Tue, Jan 21, 2025 at 4:20 PM Daniel Weeks <dwe...@apache.org> wrote:

> Hey Sreeram,
>
> I think it's worthwhile to consider what value would be added by tracking
> the footer size in metadata, but there are other options to address these
> optimization use cases.
>
> For example, if you take a look at the RangeReadable
> <https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/io/RangeReadable.java#L68>
>  interface
> for FileIO implementations, there's a readTail method so that you can
> optimistically read from the tail end of the file to try to fetch the full
> footer in a single read.  This is even optimized in some of the
> implementations (like S3InputStream) to leverage backward reads as opposed
> to seek operations which might have overhead.
>
> Depending on the size of the file, you may want to load just the tail or
> the whole file to avoid all reads.  Having the exact value definitely will
> make this more exact, but I feel like using the above approach can
> approximate the same performance benefits.
>
> Just a thought,
> -Dan
>
> On Tue, Jan 21, 2025 at 12:17 PM Sreeram Garlapati <
> gsreeramku...@gmail.com> wrote:
>
>> Hello Team!
>>
>> This is a small improvement proposal to store the *parquet footer size*
>> as part of the *data_file* metadata in the iceberg manifest
>> <https://iceberg.apache.org/spec/#manifests>.
>> *manifest_entry   >   (2) data_file  >  (146 Optional)
>> footer_size_in_bytes*
>>
>> *Motivation*:
>>
>>    - We have several sub-second read use cases on iceberg tables. We
>>    store icebergs and parquets on S3. Every hop to S3 is v.expensive (P99 of
>>    >200 milliseconds). Hence we are trying to see if we can optimize by
>>    cutting down any of these hops. One such hop is during the Parquet file
>>    read., the first read to the parquet, which is to read the last 8 bytes -
>>    to read the - footer size and par1 sequence.
>>    - Iceberg metadata already includes the file_size_in_bytes. Including
>>    the footer size benefits all the readers. ie., readers can directly issue 
>> 1
>>    I/O call to read the footer - *read_parquet_footer(filehandle,
>>    offset=file_size_in_bytes-footer_size_in_bytes-1)*
>>    - This is similar to what we have in the iceberg specification in the
>>    case of storing Table statistics
>>    <https://iceberg.apache.org/spec/#table-statistics>, puffins >
>>    *file-footer-size-in-bytes*.
>>    - This can be easily extended to ORC as needed too. Perhaps, in the
>>    ORC case, an additional property to store the postscript length is also
>>    needed.
>>
>> Truly appreciate your thoughts,
>> Sreeram <https://www.linkedin.com/in/sreeramgarlapati>
>>
>>

Re: Proposal: Parquet footer size in Iceberg metadata

Reply via email to