Hi all, (somewhat related to the discussion on the parquet mailing list about compatibility of different features in the format and to which format version they belong, which triggered https://github.com/apache/parquet-format/pull/164. But mainly related in the sense that it is rather unclear to me which features are enabled when, also for the column types).
Currently, when writing parquet files with Arrow (parquet-cpp), we default to parquet format "1.0". This means we don't use ConvertedTypes or LogicalTypes introduced in format "2.0"+. For example, this means we can (by default) only write int32 and int64 integer types, and not any of the other signed and unsigned integer types. But, most of the additional ConvertedTypes that were not present in parquet-format 1.0 (eg the different signed/unsigned integer types, timestamp, ..) were introduced in parquet-format 2.2 ( https://github.com/apache/parquet-format/pull/3, https://issues.apache.org/jira/browse/PARQUET-12) almost 7 years ago. The LogicalTypes are certainly more recent, but so with the current options of "1.0" or "2.0" when writing, you can either choose for all new features of the different 2.x release (both ConvertedTypes and LogicalTypes), or none of those (e.g. we don't have the option to say `version="2.2"`). When we are saying "version 2.0 is not yet recommended for production use" (because many other readers are not yet compatible with it), isn't this mostly about the *data page* version 2 (which, AFAIU, is separate from the format version 2.0). If so, could we start defaulting to version 2.0 (but still with date page version 1.0), or do other parquet readers actually not yet support the ConvertedTypes introduced 7 years ago? Best, Joris