[ 
https://issues.apache.org/jira/browse/BEAM-8953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada reassigned BEAM-8953:
-----------------------------------

    Assignee: Ryan Berti

> Extend ParquetIO.Read/ReadFiles.Builder to support Avro GenericData model
> -------------------------------------------------------------------------
>
>                 Key: BEAM-8953
>                 URL: https://issues.apache.org/jira/browse/BEAM-8953
>             Project: Beam
>          Issue Type: Improvement
>          Components: examples-java
>    Affects Versions: 2.16.0
>            Reporter: Ryan Berti
>            Assignee: Ryan Berti
>            Priority: Minor
>
> When utilizing ParquetIO to deserialize objects into case classes in Scala, 
> we'd like to utilize a downstream converter which takes GenericRecords and 
> converts them to instances of our case classes, rather than relying on 
> ParquetIO to deserialize into the case class via reflection + implementing 
> the IndexedRecord interface.
> The ParquetIO.Read / ParquetIO.ReadFiles Builders currently support a 
> filepattern + schema / schema arguments respectively. When using the Read / 
> ReadFiles Builders with these arguments, the underlying AvroParquetReader 
> object that gets created in the ParquetIO.ReadFiles.ReadFn method defaults to 
> utilizing an AvroReadSupport instance whose GenericData model gets set to 
> SpecificData. We'd like to have the the underlying AvroReadSupport utilize 
> the GenericData model, but there's currently no way to force this to happen 
> via the existing ParquetIO Read / ReadFiles builders. 
> I'd like to extend the ParquetIO Read / ReadFiles builders to support a new 
> method allowing users to define a GenericData model, which will then be 
> passed into the AvroParquetReader builder. I've tested and validated that 
> this method allows ParquetIO to generate GenericRecord instances without 
> requiring that the users classes can be reflectively instantiated and 
> initialized via the IndexedRecord interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to