reuvenlax commented on code in PR #27866:
URL: https://github.com/apache/beam/pull/27866#discussion_r1286591272
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java:
##########
@@ -359,12 +362,22 @@ String getOrCreateStreamName() throws Exception {
}
AppendClientInfo generateClient(@Nullable TableSchema updatedSchema)
throws Exception {
- TableSchema tableSchema =
- (updatedSchema != null) ? updatedSchema :
getCurrentTableSchema(streamName);
Review Comment:
Good catch! This would indeed have been a bug.
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java:
##########
@@ -3252,11 +3286,34 @@ private <DestinationT> WriteResult expandTyped(
formatFunction = BigQueryUtils.toTableRow(input.getToRowFunction());
}
// Infer the TableSchema from the input Beam schema.
+ // TODO: If the user provided a schema, we should use that. There are
things that can be
+ // specified in a
+ // BQ schema that don't have exact matches in a Beam schema (e.g.
GEOGRAPHY types).
TableSchema tableSchema =
BigQueryUtils.toTableSchema(input.getSchema());
dynamicDestinations =
new ConstantSchemaDestinations<>(
dynamicDestinations,
StaticValueProvider.of(BigQueryHelpers.toJsonString(tableSchema)));
+ } else if (writeProtoClass != null) {
+ if (!hasSchema) {
Review Comment:
not quite the same - this would change the semantics of the final else
clause.
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java:
##########
@@ -412,6 +413,26 @@ static SchemaInformation fromTableSchema(
.put(TableFieldSchema.Type.JSON, Type.TYPE_STRING)
.build();
+ static final Map<Descriptors.FieldDescriptor.Type, TableFieldSchema.Type>
+ PRIMITIVE_TYPES_PROTO_TO_BQ =
+ ImmutableMap.<Descriptors.FieldDescriptor.Type,
TableFieldSchema.Type>builder()
+ .put(Descriptors.FieldDescriptor.Type.INT32,
TableFieldSchema.Type.INT64)
+ .put(FieldDescriptor.Type.FIXED32, TableFieldSchema.Type.INT64)
+ .put(FieldDescriptor.Type.UINT32, TableFieldSchema.Type.INT64)
+ .put(FieldDescriptor.Type.SFIXED32, TableFieldSchema.Type.INT64)
+ .put(FieldDescriptor.Type.SINT32, TableFieldSchema.Type.INT64)
+ .put(Descriptors.FieldDescriptor.Type.INT64,
TableFieldSchema.Type.INT64)
Review Comment:
Done
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtils.java:
##########
@@ -353,7 +353,7 @@ private static Object convertRequiredField(
case "BOOL":
case "BOOLEAN":
verify(v instanceof Boolean, "Expected Boolean, got %s", v.getClass());
- return v;
+ return v.toString();
Review Comment:
Storing objects in the JSON class makes the unit tests annoying to write.
Generally after round tripping through serialization the objects come back as
strings, and tests fail (since .equals() simply compares the objects). This can
be tricky to debug since things all format the same.
While this does somewhat increase the in-memory representation of the
object, these objects are usually dominated by everything else (e.g. the full
schema is stored with every object, and the map contains the full name of every
key).
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java:
##########
@@ -709,7 +694,7 @@ static TableFieldSchema tableFieldSchemaFromDescriptorField(
if (fieldDescriptor.isRequired()) {
tableFieldSchemaBuilder =
tableFieldSchemaBuilder.setMode(TableFieldSchema.Mode.REQUIRED);
}
- if (fieldDescriptor.isOptional()) {
+ if (fieldDescriptor.isOptional() || !fieldDescriptor.isRequired()) {
tableFieldSchemaBuilder =
tableFieldSchemaBuilder.setMode(TableFieldSchema.Mode.NULLABLE);
}
Review Comment:
made these else/if
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]