Daniel Oliveira created BEAM-13618:
--------------------------------------

             Summary: Java BigQuery IO: DirectRead does not work with Beam 
Schema support.
                 Key: BEAM-13618
                 URL: https://issues.apache.org/jira/browse/BEAM-13618
             Project: Beam
          Issue Type: Bug
          Components: io-java-gcp
    Affects Versions: 2.35.0
            Reporter: Daniel Oliveira


Currently in BigQueryIO, Reads with Beam Schema support (for example using 
[readTableRowsWithSchema|https://github.com/apache/beam/blob/v2.35.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L553])
 don't actually have Schema support if using DirectRead as a read method. This 
appears to be because the expansion logic for DirectReads takes [a different 
path|https://github.com/apache/beam/blob/v2.35.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1060]
 that doesn't include any considerations for beam schemas ([example of the code 
handling Beam schemas in the default 
path|https://github.com/apache/beam/blob/v2.35.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1204]).

Part of the reason for this is likely that the current approach to Beam Schema 
support is to get a description of the BQ table's schema and then convert it to 
a Beam schema. However, with DirectRead specific columns can be excluded while 
reading, meaning that the Beam schema needed doesn't actually convert directly 
to the table's schema, it would need to be constructed based on the specific 
fields selected for the read.

(As a side note, this is currently not documented anywhere, leading me to 
believe this is an oversight or potential bug. I will add some documentation 
indicating that schema support currently does not work with DirectRead.)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to