The current flatbuf footer prototype
<https://github.com/apache/arrow/pull/43793> has relative offsets for
ColumnChunk and ColumnMetaData. If we extend the same to the column headers
in the column chunk themselves this would fix problem (1). Alternatively we
could avoid writing ColumnChunk and ColumnMetaData since we have that in
the footer.

On Wed, Oct 9, 2024 at 2:50 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Hello,
>
> The Hugging Face developers published this insightful blog post about
> their attempts to deduplicate Parquet files when they have similar
> contents. They offer a couple suggestions for improvement at the end:
> https://huggingface.co/blog/improve_parquet_dedupe
>
> Regards
>
> Antoine.
>
>
>

Reply via email to