reuvenlax commented on code in PR #33231:
URL: https://github.com/apache/beam/pull/33231#discussion_r1901100930
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java:
##########
@@ -531,6 +532,30 @@ public void process(
element.getKey().getKey(), dynamicDestinations,
datasetService);
tableSchema = converter.getTableSchema();
descriptor = converter.getDescriptor(false);
+
+ if (autoUpdateSchema) {
Review Comment:
so this doesn't quite solve the following race condition:
- schema S is updated to S' before creating stream writer, we detect it
in this codepath and store it in state.
- StreamWriter is destroyed (either because it is idle and evicted from
the cache, or because the worker crashes).
- We later recreate the StreamWriter. Because we have an updated schema
in cache, we don't execute this codepath.
In this case, I think we'll completely miss the new schema. IIUC the best
way to address this race would be for BigQuery to provide us with a version of
StreamWriter that returns schemas with new fields - i.e. not basing it off of
creation time.
##########
CHANGES.md:
##########
@@ -79,6 +79,8 @@
## Bugfixes
* Fixed X (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).
+* Fixed EventTimeTimer ordering in Prism.
([#32222](https://github.com/apache/beam/issues/32222)).
Review Comment:
this doesn't seem to be in this PR
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java:
##########
@@ -1419,7 +1421,11 @@ public WriteStream createWriteStream(String tableUrn,
WriteStream.Type type)
@Override
public @Nullable WriteStream getWriteStream(String writeStream) {
- return newWriteClient.getWriteStream(writeStream);
+ return newWriteClient.getWriteStream(
Review Comment:
Correct, but if that's the case we should replace it with a getSchema
method. getWriteStream may be used in the future for other reasons, and if we
always fetch the schema it may cause performance issues.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]