Claire McGinty created PARQUET-2292:
---------------------------------------
Summary: Improve default SpecificRecord model selection for
Avro{Write,Read}Support
Key: PARQUET-2292
URL: https://issues.apache.org/jira/browse/PARQUET-2292
Project: Parquet
Issue Type: Improvement
Reporter: Claire McGinty
AvroWriteSupport/AvroReadSupport can improve the precision of their default
`model` selection. Currently they default to new
SpecificDataSupplier().get()[0]. This means that SpecificRecord classes that
contain logical types will fail out-of-the-box unless a specific DATA_SUPPLIER
is configured that contains logical type conversions.
I think we can improve this and make logical types work by default by
defaulting to the value of the `MODEL$` field that every SpecificRecordBase
implementation contains, which already contains all the logical conversions for
that Avro type. It would require reflection, but that's what the Avro library
is already doing to fetch models for Specific types[1].
[0]
[https://github.com/apache/parquet-mr/blob/d38044f5395494e1543581a4b763f624305d3022/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java#L403-L407]
[1]
https://github.com/apache/avro/blob/release-1.11.1/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java#L76-L86
--
This message was sent by Atlassian Jira
(v8.20.10#820010)