kkdoon commented on code in PR #22718:
URL: https://github.com/apache/beam/pull/22718#discussion_r947034447


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java:
##########
@@ -606,6 +607,28 @@ public static <T> TypedRead<T> 
read(SerializableFunction<SchemaAndRecord, T> par
         .build();
   }
 
+  /**
+   * Reads from a BigQuery table or query and returns a {@link PCollection} 
with one element per
+   * each row of the table or query result, where the custom {@link 
org.apache.avro.io.DatumReader}
+   * implementation is used to parse from the BigQuery AVRO format.
+   *
+   * <p> This API allows direct deserialization of AVRO data to the target 
class.
+   */
+  public static <T> TypedRead<T> readWithDatumReader(

Review Comment:
   how would we get the TypedRead in that case? were you thinking something 
like:
   ```
    BigQueryIO.readTableRows()
               .withDatumReaderFactory((AvroSource.DatumReaderFactory<User>) 
(writer, reader) -> new SpecificDatumReader<>(reader))
               .withReaderSchema(User.getAvroSchema())
   ``` 
   ? (this doesn't work though)



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java:
##########
@@ -606,6 +607,28 @@ public static <T> TypedRead<T> 
read(SerializableFunction<SchemaAndRecord, T> par
         .build();
   }
 
+  /**
+   * Reads from a BigQuery table or query and returns a {@link PCollection} 
with one element per
+   * each row of the table or query result, where the custom {@link 
org.apache.avro.io.DatumReader}
+   * implementation is used to parse from the BigQuery AVRO format.
+   *
+   * <p> This API allows direct deserialization of AVRO data to the target 
class.
+   */
+  public static <T> TypedRead<T> readWithDatumReader(
+      AvroSource.DatumReaderFactory<T> factory, org.apache.avro.Schema 
readerSchema) {

Review Comment:
   It seems like AvroIO derives it in case the class is of SpecificData 
subType. We could do the same but we would need to take the class type as input 
instead (don't think its cleaner though). 
   
   Also, readerSchema is needed by AvroSource in general, as it does validation 
based on that, if parserFn is not set, 
    as well as derives the AvroCoder based on the readerSchema. Since it gets 
the writer schema during runtime from the file metadata, we cannot assign 
reader schema as writer schema on the submitter side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to