[ 
https://issues.apache.org/jira/browse/PARQUET-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718678#comment-17718678
 ] 

ASF GitHub Bot commented on PARQUET-2292:
-----------------------------------------

clairemcginty commented on code in PR #1078:
URL: https://github.com/apache/parquet-mr/pull/1078#discussion_r1182871206


##########
parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java:
##########
@@ -169,6 +172,46 @@ public void add(Object value) {
     }
   }
 
+  /**
+   * Returns the specific data model for a given SpecificRecord schema by 
reflecting the underlying
+   * Avro class's `MODEL$` field, or Null if the class is not on the classpath 
or reflection fails.
+   */
+  static SpecificData getModelForSchema(Schema schema) {
+    final Class<?> clazz;
+
+    if (schema != null && (schema.getType() == Schema.Type.RECORD || 
schema.getType() == Schema.Type.UNION)) {
+      clazz = SpecificData.get().getClass(schema);
+    } else {
+      return null;
+    }
+
+    final SpecificData model;
+    try {
+      final Field modelField = clazz.getDeclaredField("MODEL$");
+      modelField.setAccessible(true);
+
+      model = (SpecificData) modelField.get(null);
+    } catch (Exception e) {
+      return null;
+    }
+
+    try {
+      final String avroVersion = 
Schema.Parser.class.getPackage().getImplementationVersion();
+      // Avro 1.8 doesn't include conversions in the MODEL$ field
+      if (avroVersion.startsWith("1.8.")) {

Review Comment:
   > Since we are using reflections on private members there are no 
compatibility guarantees. We shall be very careful here. What about avro 
versions prior to 1.8? Also, what if it breaks in the future? Will the related 
unit test fail for a future Avro releases (in case of upgrading the Avro 
version in the pom)?
   
   so I've tested 1.7 and 1.8; since 1.9 Avro has stably used the `MODEL$` 
field to hold all conversions, so I feel reasonably confident about relying on 
this. If that changes, we'll catch it in the new unit tests 👍 
   
   If you want, I can surround invocations of `getModelForSchema` in a 
try/catch (in `AvroReadSupport`/`AvroWriteSupport`), and just use the default 
SpecificDataSupplier if they throw anything. That way any unexpected behavior 
would just result in logical types not being used.





> Improve default SpecificRecord model selection for Avro{Write,Read}Support
> --------------------------------------------------------------------------
>
>                 Key: PARQUET-2292
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2292
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Claire McGinty
>            Assignee: Claire McGinty
>            Priority: Major
>
> AvroWriteSupport/AvroReadSupport can improve the precision of their default 
> `model` selection. Currently they default to new 
> SpecificDataSupplier().get()[0]. This means that SpecificRecord classes that 
> contain logical types will fail out-of-the-box unless a specific 
> DATA_SUPPLIER is configured that contains logical type conversions.
> I think we can improve this and make logical types work by default by 
> defaulting to the value of the `MODEL$` field that every SpecificRecordBase 
> implementation contains, which already contains all the logical conversions 
> for that Avro type. It would require reflection, but that's what the Avro 
> library is already doing to fetch models for Specific types[1].
>  
> [0] 
> [https://github.com/apache/parquet-mr/blob/d38044f5395494e1543581a4b763f624305d3022/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java#L403-L407]
> [1] 
> https://github.com/apache/avro/blob/release-1.11.1/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java#L76-L86



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to