ahmedabu98 commented on code in PR #32514:
URL: https://github.com/apache/beam/pull/32514#discussion_r1771717574


##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageQueryTest.java:
##########
@@ -381,7 +381,8 @@ private void doQuerySourceInitialSplit(
             .setParent("projects/" + options.getProject())
             .setReadSession(
                 ReadSession.newBuilder()
-                    
.setTable(BigQueryHelpers.toTableResourceName(tempTableReference)))
+                    
.setTable(BigQueryHelpers.toTableResourceName(tempTableReference))
+                    
.setReadOptions(ReadSession.TableReadOptions.newBuilder().build()))

Review Comment:
   ```suggestion
                       
.setReadOptions(ReadSession.TableReadOptions.newBuilder()))
   ```
   nit: we don't call `.build()` on `tableReadOptionsBuilder` in 
`BigQueryStorageSourceBase`. same with the below changes.



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java:
##########
@@ -182,30 +178,18 @@ public List<BigQueryStorageStreamSource<T>> split(
       LOG.info("Read session returned {} streams", 
readSession.getStreamsList().size());
     }
 
-    Schema sessionSchema;
-    if (readSession.getDataFormat() == DataFormat.ARROW) {
-      org.apache.arrow.vector.types.pojo.Schema schema =
-          ArrowConversion.arrowSchemaFromInput(
-              readSession.getArrowSchema().getSerializedSchema().newInput());
-      org.apache.beam.sdk.schemas.Schema beamSchema =
-          ArrowConversion.ArrowSchemaTranslator.toBeamSchema(schema);
-      sessionSchema = AvroUtils.toAvroSchema(beamSchema);
-    } else if (readSession.getDataFormat() == DataFormat.AVRO) {
-      sessionSchema = new 
Schema.Parser().parse(readSession.getAvroSchema().getSchema());
-    } else {
-      throw new IllegalArgumentException(
-          "data is not in a supported dataFormat: " + 
readSession.getDataFormat());
+    // TODO: this is inconsistent with method above, where it can be null
+    Preconditions.checkStateNotNull(targetTable);
+    TableSchema tableSchema = targetTable.getSchema();
+    if (selectedFieldsProvider != null && 
selectedFieldsProvider.isAccessible()) {
+      tableSchema = BigQueryUtils.trimSchema(tableSchema, 
selectedFieldsProvider.get());

Review Comment:
   Can you explain a little (maybe include in the PR description too) why it's 
better to trim based on `selectedFields` instead of the read session's schema? 
We pass selected fields as a component when building the read session, so I 
would think it returns a trimmed schema already?



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySourceDef.java:
##########
@@ -45,12 +45,23 @@ <T> BigQuerySourceBase<T> toSource(
       SerializableFunction<TableSchema, AvroSource.DatumReaderFactory<T>> 
readerFactory,
       boolean useAvroLogicalTypes);
 
+  /**
+   * Extract the {@link TableSchema} corresponding to this source.
+   *
+   * @param bqOptions BigQueryOptions
+   * @return table schema of the source
+   * @throws BigQuerySchemaRetrievalException if schema retrieval fails
+   */
+  TableSchema getTableSchema(BigQueryOptions bqOptions);
+
   /**
    * Extract the Beam {@link Schema} corresponding to this source.
    *
    * @param bqOptions BigQueryOptions
    * @return Beam schema of the source
    * @throws BigQuerySchemaRetrievalException if schema retrieval fails
    */
-  Schema getBeamSchema(BigQueryOptions bqOptions);
+  default Schema getBeamSchema(BigQueryOptions bqOptions) {
+    return BigQueryUtils.fromTableSchema(getTableSchema(bqOptions));
+  }

Review Comment:
   I think there's no need to keep this method if we're not using it anymore. 
These classes are package private so we don't have to worry about backwards 
compatibility.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to