Daniel Oliveira created BEAM-13618:
--------------------------------------
Summary: Java BigQuery IO: DirectRead does not work with Beam
Schema support.
Key: BEAM-13618
URL: https://issues.apache.org/jira/browse/BEAM-13618
Project: Beam
Issue Type: Bug
Components: io-java-gcp
Affects Versions: 2.35.0
Reporter: Daniel Oliveira
Currently in BigQueryIO, Reads with Beam Schema support (for example using
[readTableRowsWithSchema|https://github.com/apache/beam/blob/v2.35.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L553])
don't actually have Schema support if using DirectRead as a read method. This
appears to be because the expansion logic for DirectReads takes [a different
path|https://github.com/apache/beam/blob/v2.35.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1060]
that doesn't include any considerations for beam schemas ([example of the code
handling Beam schemas in the default
path|https://github.com/apache/beam/blob/v2.35.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1204]).
Part of the reason for this is likely that the current approach to Beam Schema
support is to get a description of the BQ table's schema and then convert it to
a Beam schema. However, with DirectRead specific columns can be excluded while
reading, meaning that the Beam schema needed doesn't actually convert directly
to the table's schema, it would need to be constructed based on the specific
fields selected for the read.
(As a side note, this is currently not documented anywhere, leading me to
believe this is an oversight or potential bug. I will add some documentation
indicating that schema support currently does not work with DirectRead.)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)