Hi, Adding more metadata, especially the bits you propose, is a great idea. However, I'm not certain about its usefulness. This is primarily because I'm only used to using Avro files to store data that must be processed fully at a later stage. The only thing I'm missing is quick access to the last record, or a bit of metadata at the end of the file.
To my knowledge, adding min/max data, bloom filters, etc. is about querying efficiently. I mostly see Parquet files being used for this. What are the use cases where Avro files would be better? Kind regards, Oscar -- Oscar Westra van Holthe - Kind <opw...@apache.org> Op di 17 sep. 2024 15:29 schreef David <dam6...@gmail.com>: > Hello Gang, > > I've recently had some space to look at Avro again recently (I enjoy > contributing to something that has such a wide industry impact). > > In thinking about the block format of Avro, it currently stores Metadata > about the number of records in each block. I'm performing a thought > exercise of replacing the count field with a map and allowing for a more > generic set of metadata. In particular, would want to add better scan > support: Bloom filters, min, max values. > > Making this backwards compatible looks hard at first, but does anyone in > the community see value here? > > > Thanks. >