clairemcginty commented on code in PR #1078:
URL: https://github.com/apache/parquet-mr/pull/1078#discussion_r1182871206
##########
parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java:
##########
@@ -169,6 +172,46 @@ public void add(Object value) {
}
}
+ /**
+ * Returns the specific data model for a given SpecificRecord schema by
reflecting the underlying
+ * Avro class's `MODEL$` field, or Null if the class is not on the classpath
or reflection fails.
+ */
+ static SpecificData getModelForSchema(Schema schema) {
+ final Class<?> clazz;
+
+ if (schema != null && (schema.getType() == Schema.Type.RECORD ||
schema.getType() == Schema.Type.UNION)) {
+ clazz = SpecificData.get().getClass(schema);
+ } else {
+ return null;
+ }
+
+ final SpecificData model;
+ try {
+ final Field modelField = clazz.getDeclaredField("MODEL$");
+ modelField.setAccessible(true);
+
+ model = (SpecificData) modelField.get(null);
+ } catch (Exception e) {
+ return null;
+ }
+
+ try {
+ final String avroVersion =
Schema.Parser.class.getPackage().getImplementationVersion();
+ // Avro 1.8 doesn't include conversions in the MODEL$ field
+ if (avroVersion.startsWith("1.8.")) {
Review Comment:
> Since we are using reflections on private members there are no
compatibility guarantees. We shall be very careful here. What about avro
versions prior to 1.8? Also, what if it breaks in the future? Will the related
unit test fail for a future Avro releases (in case of upgrading the Avro
version in the pom)?
so I've tested 1.7 and 1.8; since 1.9 Avro has stably used the `MODEL$`
field to hold all conversions, so I feel reasonably confident about relying on
this. If that changes, we'll catch it in the new unit tests 👍
If you want, I can surround invocations of `getModelForSchema` in a
try/catch (in `AvroReadSupport`/`AvroWriteSupport`), and just use the default
SpecificDataSupplier if they throw anything. That way any unexpected behavior
would just result in logical types not being used.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]