reuvenlax commented on code in PR #24145:
URL: https://github.com/apache/beam/pull/24145#discussion_r1067613845
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/SplittingIterable.java:
##########
@@ -57,7 +84,37 @@ public ProtoRows next() {
while (underlyingIterator.hasNext()) {
StorageApiWritePayload payload = underlyingIterator.next();
ByteString byteString = ByteString.copyFrom(payload.getPayload());
-
+ if (autoUpdateSchema) {
+ try {
+ @Nullable TableRow unknownFields = payload.getUnknownFields();
+ if (unknownFields != null) {
+ // Protocol buffer serialization format supports
concatenation. We serialize any new
+ // "known" fields
+ // into a proto and concatenate to the existing proto.
+ try {
+ byteString =
+ byteString.concat(
Review Comment:
The prior convert message only includes fields known to it in the proto it
generates. It can't include fields it doesn't know about as they would have to
be in the proto descriptor (and it can't use the proto's unknown field set as
that requires field ids, which is not known yet).
Therefore the incoming byteString contains only fields that were known to
the convert stage, and all other fields are put into the json unknownFields
object. What we are doing here is taking advantage of the fact that the write
step has a more up-to-date view on the schema, so we walk over the
unknownFields json and extract whatever fields are now known (which might still
be only a subset of the remaining fields). We then convert those unknownFields
to a proto, and concatenate the two protos.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]