[
https://issues.apache.org/jira/browse/BEAM-8953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993919#comment-16993919
]
Ryan Berti edited comment on BEAM-8953 at 12/11/19 11:09 PM:
-------------------------------------------------------------
Example implementation would be:
* add 'abstract Builder setDataModel(GenericData model)` method to builders
* utilize model value on ParquetIO.java:221 as argument to
[https://javadoc.io/doc/org.apache.parquet/parquet-avro/1.10.1/org/apache/parquet/avro/AvroParquetReader.Builder.html#withDataModel-org.apache.avro.generic.GenericData-]
was (Author: ryan berti):
Example implementation would be:
* add 'abstract Builder setDataModel(GenericData model)` method to builders
* utilize model value on ParquetIO.java:221 as argument to
[https://javadoc.io/doc/org.apache.parquet/parquet-avro/1.10.1/org/apache/parquet/avro/AvroParquetReader.Builder.html]
> Extend ParquetIO.Read/ReadFiles.Builder to support Avro GenericData model
> -------------------------------------------------------------------------
>
> Key: BEAM-8953
> URL: https://issues.apache.org/jira/browse/BEAM-8953
> Project: Beam
> Issue Type: Improvement
> Components: examples-java
> Affects Versions: 2.16.0
> Reporter: Ryan Berti
> Priority: Minor
>
> When utilizing ParquetIO to deserialize objects into case classes in Scala,
> we'd like to utilize a downstream converter which takes GenericRecords and
> converts them to instances of our case classes, rather than relying on
> ParquetIO to deserialize into the case class via reflection + implementing
> the IndexedRecord interface.
> The ParquetIO.Read / ParquetIO.ReadFiles Builders currently support a
> filepattern + schema / schema arguments respectively. When using the Read /
> ReadFiles Builders with these arguments, the underlying AvroParquetReader
> object that gets created in the ParquetIO.ReadFiles.ReadFn method defaults to
> utilizing an AvroReadSupport instance whose GenericData model gets set to
> SpecificData. We'd like to have the the underlying AvroReadSupport utilize
> the GenericData model, but there's currently no way to force this to happen
> via the existing ParquetIO Read / ReadFiles builders.
> I'd like to extend the ParquetIO Read / ReadFiles builders to support a new
> method allowing users to define a GenericData model, which will then be
> passed into the AvroParquetReader builder. I've tested and validated that
> this method allows ParquetIO to generate GenericRecord instances without
> requiring that the users classes can be reflectively instantiated and
> initialized via the IndexedRecord interface.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)