Abacn commented on code in PR #31125:
URL: https://github.com/apache/beam/pull/31125#discussion_r1581648996
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java:
##########
@@ -388,8 +398,22 @@ public synchronized BigQueryStorageStreamSource<T>
getCurrentSource() {
// The following line is required to trigger the
`FailedPreconditionException` on which
// the SplitReadStream validation logic depends. Removing it will
cause incorrect
// split operations to succeed.
- newResponseIterator.hasNext();
- storageClient.reportPendingMetrics();
+ Future<Boolean> future =
executor.submit(newResponseIterator::hasNext);
+ try {
+ // The intended wait time is in sync with
splitReadStreamSettings.setRetrySettings in
+ // StorageClientImpl.
+ future.get(30, TimeUnit.SECONDS);
Review Comment:
This is correspond to the config here
https://github.com/apache/beam/blob/673da546c1465c931fdbbc5769e7d566ff55b4d8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L1735-L1742
It intended to set the API call timeout during split request to 30s (and
otherwise reject split). However, a bug fix #21739 added this call here, which
has internal retry logic, but this added call is not covered the
split-stream-retry config. When there is quota issue, this is causing pipeline
stuck
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]