There has not been an official release of the Parquet C++ library in quite
some time.  I don't think this is a huge issue as the parquet bits are
packaged into each Arrow release.

However, one  practical concern is when bugs crop up for a particular
version writing a parquet file, it is impossible for readers to mitigate
them.  One practical example is a long standing bug (with a fix recently
merged) where the comparator for ByteArray/FLBA encoded Decimals was
incorrectly  implemented.  This means min/max statistics for these Decimal
values cannot be relied on.

I'd like to propose that we change the default version string [1] for
parquet-cpp to reflect arrow releases (e.g. "parquet-cpp-arrow version
3.0.0" instead of "parquet-cpp version 1.5.1-snapshot").

Any objections? An alternative would be to try to do releases of
parquet-cpp on the same timeline as Arrow releases.

Thanks,
Micah

[1]
https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/cpp/src/parquet/parquet_version.h.in

Reply via email to