[
https://issues.apache.org/jira/browse/PARQUET-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718485#comment-17718485
]
ASF GitHub Bot commented on PARQUET-2292:
-----------------------------------------
gszadovszky commented on code in PR #1078:
URL: https://github.com/apache/parquet-mr/pull/1078#discussion_r1182232703
##########
parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java:
##########
@@ -169,6 +172,46 @@ public void add(Object value) {
}
}
+ /**
+ * Returns the specific data model for a given SpecificRecord schema by
reflecting the underlying
+ * Avro class's `MODEL$` field, or Null if the class is not on the classpath
or reflection fails.
+ */
+ static SpecificData getModelForSchema(Schema schema) {
+ final Class<?> clazz;
+
+ if (schema != null && (schema.getType() == Schema.Type.RECORD ||
schema.getType() == Schema.Type.UNION)) {
+ clazz = SpecificData.get().getClass(schema);
+ } else {
+ return null;
+ }
+
+ final SpecificData model;
+ try {
+ final Field modelField = clazz.getDeclaredField("MODEL$");
+ modelField.setAccessible(true);
+
+ model = (SpecificData) modelField.get(null);
+ } catch (Exception e) {
+ return null;
+ }
+
+ try {
+ final String avroVersion =
Schema.Parser.class.getPackage().getImplementationVersion();
+ // Avro 1.8 doesn't include conversions in the MODEL$ field
+ if (avroVersion.startsWith("1.8.")) {
+ final Field conversionsField = clazz.getDeclaredField("conversions");
+ conversionsField.setAccessible(true);
+
+ final Conversion<?>[] conversions = (Conversion<?>[])
conversionsField.get(null);
+
Arrays.stream(conversions).filter(Objects::nonNull).forEach(model::addLogicalTypeConversion);
+ }
+ } catch (Exception e) {
Review Comment:
Similar to the previous comment about logging and catching specific
exceptions.
##########
parquet-avro/src/test/java/org/apache/parquet/avro/TestSpecificReadWrite.java:
##########
@@ -237,6 +242,43 @@ public void testAvroReadSchema() throws IOException {
}
}
+ @Test
Review Comment:
We need to test both of `getModelForSchema` related to the avro version. If
the version check gets more complicated, maybe more versions are to cover.
##########
parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java:
##########
@@ -169,6 +172,46 @@ public void add(Object value) {
}
}
+ /**
+ * Returns the specific data model for a given SpecificRecord schema by
reflecting the underlying
+ * Avro class's `MODEL$` field, or Null if the class is not on the classpath
or reflection fails.
+ */
+ static SpecificData getModelForSchema(Schema schema) {
Review Comment:
TBH I do not have a strong opinion on any. I am fine with the current one if
it works.
##########
parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java:
##########
@@ -169,6 +172,46 @@ public void add(Object value) {
}
}
+ /**
+ * Returns the specific data model for a given SpecificRecord schema by
reflecting the underlying
+ * Avro class's `MODEL$` field, or Null if the class is not on the classpath
or reflection fails.
+ */
+ static SpecificData getModelForSchema(Schema schema) {
+ final Class<?> clazz;
+
+ if (schema != null && (schema.getType() == Schema.Type.RECORD ||
schema.getType() == Schema.Type.UNION)) {
+ clazz = SpecificData.get().getClass(schema);
+ } else {
+ return null;
+ }
+
+ final SpecificData model;
+ try {
+ final Field modelField = clazz.getDeclaredField("MODEL$");
+ modelField.setAccessible(true);
+
+ model = (SpecificData) modelField.get(null);
+ } catch (Exception e) {
+ return null;
+ }
+
+ try {
+ final String avroVersion =
Schema.Parser.class.getPackage().getImplementationVersion();
+ // Avro 1.8 doesn't include conversions in the MODEL$ field
+ if (avroVersion.startsWith("1.8.")) {
Review Comment:
Since we are using reflections on private members there are no compatibility
guarantees. We shall be very careful here. What about avro versions prior to
1.8? Also, what if it breaks in the future? Will the related unit test fail for
a future Avro releases (in case of upgrading the Avro version in the pom)?
##########
parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java:
##########
@@ -169,6 +172,46 @@ public void add(Object value) {
}
}
+ /**
+ * Returns the specific data model for a given SpecificRecord schema by
reflecting the underlying
+ * Avro class's `MODEL$` field, or Null if the class is not on the classpath
or reflection fails.
+ */
+ static SpecificData getModelForSchema(Schema schema) {
+ final Class<?> clazz;
+
+ if (schema != null && (schema.getType() == Schema.Type.RECORD ||
schema.getType() == Schema.Type.UNION)) {
+ clazz = SpecificData.get().getClass(schema);
+ } else {
+ return null;
+ }
+
+ final SpecificData model;
+ try {
+ final Field modelField = clazz.getDeclaredField("MODEL$");
+ modelField.setAccessible(true);
+
+ model = (SpecificData) modelField.get(null);
+ } catch (Exception e) {
Review Comment:
I think, we should log this exception. I would be also nice to use specific
exceptions instead of catching everything. WDYT?
> Improve default SpecificRecord model selection for Avro{Write,Read}Support
> --------------------------------------------------------------------------
>
> Key: PARQUET-2292
> URL: https://issues.apache.org/jira/browse/PARQUET-2292
> Project: Parquet
> Issue Type: Improvement
> Reporter: Claire McGinty
> Assignee: Claire McGinty
> Priority: Major
>
> AvroWriteSupport/AvroReadSupport can improve the precision of their default
> `model` selection. Currently they default to new
> SpecificDataSupplier().get()[0]. This means that SpecificRecord classes that
> contain logical types will fail out-of-the-box unless a specific
> DATA_SUPPLIER is configured that contains logical type conversions.
> I think we can improve this and make logical types work by default by
> defaulting to the value of the `MODEL$` field that every SpecificRecordBase
> implementation contains, which already contains all the logical conversions
> for that Avro type. It would require reflection, but that's what the Avro
> library is already doing to fetch models for Specific types[1].
>
> [0]
> [https://github.com/apache/parquet-mr/blob/d38044f5395494e1543581a4b763f624305d3022/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java#L403-L407]
> [1]
> https://github.com/apache/avro/blob/release-1.11.1/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java#L76-L86
--
This message was sent by Atlassian Jira
(v8.20.10#820010)