ahmedabu98 commented on issue #33238: URL: https://github.com/apache/beam/issues/33238#issuecomment-2523547865
Say a streaming pipeline has been running for a while.. then the table's schema gets updated. If the pipeline decides to create new write streams after this (e.g. autosharding determines we need more shards), we will create those new write streams based on the original schema. We do not communicate to new shards that actually we are writing based on a new schema. See the following code reference: https://github.com/apache/beam/blob/288c1569d1eca4e8e431255ab74c1ffb3d9b05fd/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java#L514-L530 `updatedSchema` is a ValueState that exists within the scope of a shard. It only gets initialized when a Schema change happens within the lifetime of this shard. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
