TheNeuralBit commented on a change in pull request #14586:
URL: https://github.com/apache/beam/pull/14586#discussion_r634753508
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
##########
@@ -601,8 +602,17 @@ public static Read read() {
@Override
public TableRow apply(SchemaAndRecord schemaAndRecord) {
- return BigQueryAvroUtils.convertGenericRecordToTableRow(
- schemaAndRecord.getRecord(), schemaAndRecord.getTableSchema());
+ // TODO(BEAM-9114): Implement a function to encapsulate row conversion
logic.
+ try {
+ return BigQueryAvroUtils.convertGenericRecordToTableRow(
+ schemaAndRecord.getRecord(), schemaAndRecord.getTableSchema());
+ } catch (IllegalStateException i) {
+ if (schemaAndRecord.getRow() != null) {
+ return BigQueryUtils.toTableRow().apply(schemaAndRecord.getRow());
+ }
+ throw new IllegalStateException(
+ "Record should be of instance GenericRecord (for Avro format) or
of instance Row (for Arrow format), but it is not.");
+ }
Review comment:
Hi @MiguelAnzoWizeline - sorry I missed this. An Arrow `RecordBatch`
contains multiple records, and each needs to map to its own `Row` instance. I
don't think you actually need to worry about that so much to do what I'm
proposing, the code you mentioned to convert a `VectorSchemaRoot` to
`Iterable<Row>` is already handling this. That's why for the ARROW path,
`SchemaAndRecord` contains a `Row` instance.
The problem is that for the AVRO path, `SchemaAndRecord` contains an Avro
`GenericRecord`, which is why we need to have this kind of hacky check here to
handle either `Row` or `GenericRecord`. What I'm suggesting is that we should
use the more generic `Row` for this path too, since there's code for making a
`Row` out of a `GenericRecord` in `AvroUtils`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]