Hi,

Adding more metadata, especially the bits you propose, is a great idea.
However, I'm not certain about its usefulness. This is primarily because
I'm only used to using Avro files to store data that must be processed
fully at a later stage. The only thing I'm missing is quick access to the
last record, or a bit of metadata at the end of the file.

To my knowledge, adding min/max data,  bloom filters, etc. is about
querying efficiently. I mostly see Parquet files being used for this. What
are the use cases where Avro files would be better?


Kind regards,

Oscar
-- 
Oscar Westra van Holthe - Kind <opw...@apache.org>

Op di 17 sep. 2024 15:29 schreef David <dam6...@gmail.com>:

> Hello Gang,
>
> I've recently had some space to look at Avro again recently (I enjoy
> contributing to something that has such a wide industry impact).
>
> In thinking about the block format of Avro, it currently stores Metadata
> about the number of records in each block. I'm performing a thought
> exercise of replacing the count field with a map and allowing for a more
> generic set of metadata. In particular, would want to add better scan
> support: Bloom filters, min, max values.
>
> Making this backwards compatible looks hard at first, but does anyone in
> the community see value here?
>
>
> Thanks.
>

Reply via email to