theosib-amazon commented on code in PR #957:
URL: https://github.com/apache/parquet-mr/pull/957#discussion_r901733632
##########
parquet-avro/src/main/java/org/apache/parquet/avro/AvroReadSupport.java:
##########
@@ -136,10 +137,22 @@ public RecordMaterializer<T> prepareForRead(
GenericData model = getDataModel(configuration);
String compatEnabled = metadata.get(AvroReadSupport.AVRO_COMPATIBILITY);
- if (compatEnabled != null && Boolean.valueOf(compatEnabled)) {
- return newCompatMaterializer(parquetSchema, avroSchema, model);
+
+ try {
+ if (compatEnabled != null && Boolean.valueOf(compatEnabled)) {
+ return newCompatMaterializer(parquetSchema, avroSchema, model);
+ }
+ return new AvroRecordMaterializer<T>(parquetSchema, avroSchema, model);
+ } catch (InvalidRecordException | ClassCastException e) {
Review Comment:
I think the underlying problem is that some versions of ParquetMR produce
*bad schemas*, so when we try to load those same files, parsing fails, since
the Parquet schema implicit in the file metadata doesn't match up with the
stored Avro schema. I'm not sure what to do about bad schemas other than to
throw them away and try a fallback.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]