Ryan Berti created BEAM-8953:
--------------------------------

             Summary: Extend ParquetIO.Read/ReadFiles.Builder to support Avro 
GenericData model
                 Key: BEAM-8953
                 URL: https://issues.apache.org/jira/browse/BEAM-8953
             Project: Beam
          Issue Type: Improvement
          Components: examples-java
    Affects Versions: 2.16.0
            Reporter: Ryan Berti


When utilizing ParquetIO to deserialize objects into case classes in Scala, 
we'd like to utilize a downstream converter which takes GenericRecords and 
converts them to instances of our case classes, rather than relying on 
ParquetIO to deserialize into the case class via reflection + implementing the 
IndexedRecord interface.

The ParquetIO.Read / ParquetIO.ReadFiles Builders currently support a 
filepattern + schema / schema arguments respectively. When using the Read / 
ReadFiles Builders with these arguments, the underlying AvroParquetReader 
object that gets created in the ParquetIO.ReadFiles.ReadFn method defaults to 
utilizing an AvroReadSupport instance whose GenericData model gets set to 
SpecificData. We'd like to have the the underlying AvroReadSupport utilize the 
GenericData model, but there's currently no way to force this to happen via the 
existing ParquetIO Read / ReadFiles builders. 


I'd like to extend the ParquetIO Read / ReadFiles builders to support a new 
method allowing users to define a GenericData model, which will then be passed 
into the AvroParquetReader builder. I've tested and validated that this method 
allows ParquetIO to generate GenericRecord instances without requiring that the 
users classes can be reflectively instantiated and initialized via the 
IndexedRecord interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to