On Thu, 12 May 2022 09:46:57 -0700 William Butler <[email protected]> wrote: > > From the JIRA, the converted type looks something like > > required group FeatureAmounts (MAP) { > repeated group map (MAP_KEY_VALUE) { > required binary key (STRING); > required binary key (STRING); > } > } > > > but the logical type looks like > > required group FeatureAmounts (MAP) { > repeated group map (UNKNOWN) { > required binary key (STRING); > required binary key (STRING); > } > } > > Parquet C++ does not like that the UNKNOWN/NullLogicalType is being used in > the groups and rejects the schema with an exception.
Well, why is UNKNOWN used here? This seems like a bug in the producer: if MAP_KEY_VALUE does not have an equivalent logical type, then no logical type annotation should be produced, instead of the "UNKNOWN" logical type annotation which means that all values are null and the "real" type of the data is therefore lost. (I understand that this is probably due to confusion from the misnaming of the "UNKNOWN" logical type, which would have been more appropriately named "ALWAYS_NULL" or similar) > The second example involves an INT64 column with a TIMESTAMP_MILLIS > converted type but a String logical type. Parquet-mr in this example > fallbacks to the timestamp converted type whereas Parquet C++ throws an > exception. Well, I don't know why a String logical type should be accepted for integer columns with a timestamp converted type. The fact that parquet-mr accepts it sounds like a bug in parquet-mr, IMO. Regards Antoine.
